Re: [EXT] [vpp-dev] RFC: buffer manager rework

2019-02-04 Thread Jerin Jacob Kollanukkaran
On Mon, 2019-02-04 at 19:32 +0100, Damjan Marion wrote:


On 4 Feb 2019, at 14:19, Jerin Jacob Kollanukkaran 
mailto:jer...@marvell.com>> wrote:

On Sun, 2019-02-03 at 21:13 +0100, Damjan Marion wrote:
External Email

On 3 Feb 2019, at 20:13, Saxena, Nitin 
mailto:nitin.sax...@cavium.com>> wrote:

Hi Damjan,

See function octeontx_fpa_bufpool_alloc() called by octeontx_fpa_dequeue(). Its 
a single read instruction to get the pointer of data.

Yeah saw that, and today vpp buffer manager can grab up to 16 buffer indices 
with one instructions so no big deal here

Similarly, octeontx_fpa_bufpool_free() is also a single write instruction.

So, If you are able to prove with numbers that current software solution is 
low-performant and that you are confident that you can do significantly better, 
I will be happy to work with you on implementing support for hardware buffer 
manager.
First of all I welcome your patch as we were also trying to remove latencies 
seen by memcpy_x4() of buffer template. As I said earlier hardware buffer 
coprocessor is being used by other packet engines hence the support has to be 
added in VPP. I am looking for suggestion for its resolution.

You can hardly get any suggestion from my side if you are ignoring my 
questions, which I asked in my previous email to get better understanding of 
what your hardware do.

"It is hardware so it is fast" is not real argument, we need real datapoints 
before investing time into this area


Adding more details of HW mempool manger attributes:

1) Semantically HW mempool manager is same as SW mempool manger
2) HW mempool mangers has "alloc/dequeue" and "free/enqueue" operation as SW 
mempool manager
3) HW mempool mangers can work with SW per core local cache scheme too
4) user metadata initialization is not done in HW. SW needs to do before free() 
or after alloc()
5) Typically it has an operation to "Dont free" the packet after Tx. Which can 
be used as back end to clone the packet(aka reference count schemes)
6) How does HW pool manger improves the performance:
- MP/MC can work without locks(HW takes care internally)
- HW Frees the buffer on Tx unlike core does in SW mempool case. So it does 
save CPU cycles packet Tx and cost of bringing packet again
in L1 cache.
- On the RX side, HW alloc/dequeue packet from mempool. No SW intervention 
required.

In terms of abstraction. DPDK mempool manger does abstract SW and HW mempool 
though static struct rte_mempool_ops.

Limitations:
1) Some NPU packet processing HW can work only with HW mempool manger.(Aka it 
can not work with SW mempool manager
as on the RX, HW looks for mempool manager to alloc and then form the packet)

Using DPDK abstractions will enable to write agositic software which works NPU 
and CPUs models.

VPP is not DPDK application so that doesn't work for us. DPDK is just one 
optional device driver access method
and I hear more and more people asking for VPP without DPDK.

We can implement hardware buffer manager support in VPP, but honestly I'm not 
convinced that will bring any huge value and
justify time investment. I would like that somebody proves me wrong, but with 
real data, not with statements like "it is hardware so it is faster".

I believe, I have listed the HW buffer manager attributes and how it works and 
what gain it gives(See point 6)
Need to do it if VPP needs to support NPU.
In terms of data point, What data point you would like to have?





--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12166): https://lists.fd.io/g/vpp-dev/message/12166
Mute This Topic: https://lists.fd.io/mt/29655016/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [EXT] [vpp-dev] RFC: buffer manager rework

2019-02-04 Thread Damjan Marion via Lists.Fd.Io


> On 4 Feb 2019, at 19:38, Jerin Jacob Kollanukkaran  wrote:
> 
> On Mon, 2019-02-04 at 19:32 +0100, Damjan Marion wrote:
>> 
>> 
>>> On 4 Feb 2019, at 14:19, Jerin Jacob Kollanukkaran >> > wrote:
>>> 
>>> On Sun, 2019-02-03 at 21:13 +0100, Damjan Marion wrote:
 External Email
 
> On 3 Feb 2019, at 20:13, Saxena, Nitin  > wrote:
> 
> Hi Damjan,
> 
> See function octeontx_fpa_bufpool_alloc() called by 
> octeontx_fpa_dequeue(). Its a single read instruction to get the pointer 
> of data.
 
 Yeah saw that, and today vpp buffer manager can grab up to 16 buffer 
 indices with one instructions so no big deal here
 
> Similarly, octeontx_fpa_bufpool_free() is also a single write 
> instruction. 
> 
>> So, If you are able to prove with numbers that current software solution 
>> is low-performant and that you are confident that you can do 
>> significantly better, I will be happy to work with you on implementing 
>> support for hardware buffer manager.
> First of all I welcome your patch as we were also trying to remove 
> latencies seen by memcpy_x4() of buffer template. As I said earlier 
> hardware buffer coprocessor is being used by other packet engines hence 
> the support has to be added in VPP. I am looking for suggestion for its 
> resolution. 
 
 You can hardly get any suggestion from my side if you are ignoring my 
 questions, which I asked in my previous email to get better understanding 
 of what your hardware do.
 
 "It is hardware so it is fast" is not real argument, we need real 
 datapoints before investing time into this area
>>> 
>>> 
>>> Adding more details of HW mempool manger attributes:
>>> 
>>> 1) Semantically HW mempool manager is same as SW mempool manger
>>> 2) HW mempool mangers has "alloc/dequeue" and "free/enqueue" operation as 
>>> SW mempool manager
>>> 3) HW mempool mangers can work with SW per core local cache scheme too
>>> 4) user metadata initialization is not done in HW. SW needs to do before 
>>> free() or after alloc()
>>> 5) Typically it has an operation to "Dont free" the packet after Tx. Which 
>>> can be used as back end to clone the packet(aka reference count schemes)
>>> 6) How does HW pool manger improves the performance:
>>> - MP/MC can work without locks(HW takes care internally)
>>> - HW Frees the buffer on Tx unlike core does in SW mempool case. So it does 
>>> save CPU cycles packet Tx and cost of bringing packet again
>>> in L1 cache.
>>> - On the RX side, HW alloc/dequeue packet from mempool. No SW intervention 
>>> required.
>>> 
>>> In terms of abstraction. DPDK mempool manger does abstract SW and HW 
>>> mempool though static struct rte_mempool_ops.
>>> 
>>> Limitations:
>>> 1) Some NPU packet processing HW can work only with HW mempool manger.(Aka 
>>> it can not work with SW mempool manager
>>> as on the RX, HW looks for mempool manager to alloc and then form the 
>>> packet)
>>> 
>>> Using DPDK abstractions will enable to write agositic software which works 
>>> NPU and CPUs models.
>> 
>> VPP is not DPDK application so that doesn't work for us. DPDK is just one 
>> optional device driver access method
>> and I hear more and more people asking for VPP without DPDK.
>> 
>> We can implement hardware buffer manager support in VPP, but honestly I'm 
>> not convinced that will bring any huge value and 
>> justify time investment. I would like that somebody proves me wrong, but 
>> with real data, not with statements like "it is hardware so it is faster".
> 
> I believe, I have listed the HW buffer manager attributes and how it works 
> and what gain it gives(See point 6)

Let me just confirm, in DPDK case, you are checking refcnt as part of tx 
enqueue, and marking such buffer with don't free flag.
So packets which have refcnt==1 actually never end up in mepool cache. If this 
is correct understanding, how that fits int your
statement that it can work with per core cache scheme.

What happens with with packets which are marked as don't free? How do you deal 
with refcnt decrement? And how do you track them?

> Need to do it if VPP needs to support NPU.

New NPU support in VPP can be done quite easily. It took me less than a week to 
introduce support for Marvell PP2.

> In terms of data point, What data point you would like to have?

Expected performance gain. I.e. today vpp takes roughly 100 clocks/packet for 
full ip4 forwarding baseline test on x86.
IP4 forwarding means (rx ring enqueue, ethertype lookup, ip4 mandatory checks, 
ip4 lookup, l2 header rewrite, tx enqueue, tx buffer free, counters).
I would like to understand what are the numbers for similar test on arm today 
and how much of improvement you expect by implementing hw buffer manager.
 
-- 
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online 

Re: [EXT] [vpp-dev] RFC: buffer manager rework

2019-02-04 Thread Paul Vinciguerra
+1 VPP without DPDK.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12164): https://lists.fd.io/g/vpp-dev/message/12164
Mute This Topic: https://lists.fd.io/mt/29655016/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [EXT] [vpp-dev] RFC: buffer manager rework

2019-02-04 Thread Damjan Marion via Lists.Fd.Io


> On 4 Feb 2019, at 14:19, Jerin Jacob Kollanukkaran  wrote:
> 
> On Sun, 2019-02-03 at 21:13 +0100, Damjan Marion wrote:
>> External Email
>> 
>> 
>>> On 3 Feb 2019, at 20:13, Saxena, Nitin >> > wrote:
>>> 
>>> Hi Damjan,
>>> 
>>> See function octeontx_fpa_bufpool_alloc() called by octeontx_fpa_dequeue(). 
>>> Its a single read instruction to get the pointer of data.
>> 
>> Yeah saw that, and today vpp buffer manager can grab up to 16 buffer indices 
>> with one instructions so no big deal here
>> 
>>> Similarly, octeontx_fpa_bufpool_free() is also a single write instruction. 
>>> 
 So, If you are able to prove with numbers that current software solution 
 is low-performant and that you are confident that you can do significantly 
 better, I will be happy to work with you on implementing support for 
 hardware buffer manager.
>>> First of all I welcome your patch as we were also trying to remove 
>>> latencies seen by memcpy_x4() of buffer template. As I said earlier 
>>> hardware buffer coprocessor is being used by other packet engines hence the 
>>> support has to be added in VPP. I am looking for suggestion for its 
>>> resolution. 
>> 
>> You can hardly get any suggestion from my side if you are ignoring my 
>> questions, which I asked in my previous email to get better understanding of 
>> what your hardware do.
>> 
>> "It is hardware so it is fast" is not real argument, we need real datapoints 
>> before investing time into this area
> 
> 
> Adding more details of HW mempool manger attributes:
> 
> 1) Semantically HW mempool manager is same as SW mempool manger
> 2) HW mempool mangers has "alloc/dequeue" and "free/enqueue" operation as SW 
> mempool manager
> 3) HW mempool mangers can work with SW per core local cache scheme too
> 4) user metadata initialization is not done in HW. SW needs to do before 
> free() or after alloc()
> 5) Typically it has an operation to "Dont free" the packet after Tx. Which 
> can be used as back end to clone the packet(aka reference count schemes)
> 6) How does HW pool manger improves the performance:
> - MP/MC can work without locks(HW takes care internally)
> - HW Frees the buffer on Tx unlike core does in SW mempool case. So it does 
> save CPU cycles packet Tx and cost of bringing packet again
> in L1 cache.
> - On the RX side, HW alloc/dequeue packet from mempool. No SW intervention 
> required.
> 
> In terms of abstraction. DPDK mempool manger does abstract SW and HW mempool 
> though static struct rte_mempool_ops.
> 
> Limitations:
> 1) Some NPU packet processing HW can work only with HW mempool manger.(Aka it 
> can not work with SW mempool manager
> as on the RX, HW looks for mempool manager to alloc and then form the packet)
> 
> Using DPDK abstractions will enable to write agositic software which works 
> NPU and CPUs models.

VPP is not DPDK application so that doesn't work for us. DPDK is just one 
optional device driver access method
and I hear more and more people asking for VPP without DPDK.

We can implement hardware buffer manager support in VPP, but honestly I'm not 
convinced that will bring any huge value and 
justify time investment. I would like that somebody proves me wrong, but with 
real data, not with statements like "it is hardware so it is faster".

-- 
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12163): https://lists.fd.io/g/vpp-dev/message/12163
Mute This Topic: https://lists.fd.io/mt/29655016/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [EXT] Re: [vpp-dev] RFC: buffer manager rework

2019-02-04 Thread Jerin Jacob Kollanukkaran
is your suggestion to support such hardware.

Before I can provide any suggestion I need to understand better what those 
hardware buffer managers do
and why they are better than pure software solution we have today.



 - first 64-bytes of metadata are initialised on free, so buffer alloc is very 
fast
Is it fair to say if a mempool is created per worker core per sw_index 
(interface) then buffer template copy can be avoided even during free (It can 
be done only once at init time)

The really expensive part of buffer free operation is bringing cacheline into 
L1, and we need to do that to verify reference count of the packet.
At the moment when data is in L1, simply copying template will not cost much. 
1-2 clocks on x86, not sure about arm but still i expect that it will result in 
4 128-bit stores.
That was the rationale for resetting the metadata during buffer free.

So to answer your question, having buffer per sw-interface will likely improve 
performance a bit, but it will also cause sub-optimal use of buffer memory.
Such solution will also have problem in scaling, for example if you have 
hundreds of virtual interfaces...



Thanks,
Nitin


From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> on behalf of Damjan Marion 
via Lists.Fd.Io mailto:dmarion=me@lists.fd.io>>
Sent: Friday, January 25, 2019 10:38 PM
To: vpp-dev
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] RFC: buffer manager rework

External Email

I am very close to the finish line with buffer management rework patch, and 
would like to
ask people to take a look before it is merged.

https://gerrit.fd.io/r/16638

It significantly improves performance of buffer alloc free and introduces numa 
awareness.
On my skylake platinum 8180 system, with native AVF driver observed performance 
improvement is:

- single core, 2 threads, ipv4 base forwarding test, CPU running at 2.5GHz (TB 
off):

old code - dpdk buffer manager: 20.4 Mpps
old code - old native buffer manager: 19.4 Mpps
new code: 24.9 Mpps

With DPDK drivers performance stays same as DPDK is maintaining own internal 
buffer cache.
So major perf gain should be observed in native code like: vhost-user, memif, 
AVF, host stack.

user facing changes:
to change number of buffers:
  old startup.conf:
dpdk { num-mbufs  }
  new startup.conf:
buffers { buffers-per-numa }

Internal changes:
 - free lists are deprecated
 - buffer metadata is always initialised.
 - first 64-bytes of metadata are initialised on free, so buffer alloc is very 
fast
 - DPDK mempools are not used anymore, we register custom mempool ops, and dpdk 
is taking buffers from VPP
 - to support such operation plugin can request external header space - in case 
of DPDK it stores rte_mbuf + rte_mempool_objhdr

I'm still running some tests so possible minor changes are possible, but 
nothing major expected.

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12016): https://lists.fd.io/g/vpp-dev/message/12016
Mute This Topic: https://lists.fd.io/mt/29539221/675748
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[nsax...@caviumnetworks.com<mailto:nsax...@caviumnetworks.com>]
-=-=-=-=-=-=-=-=-=-=-=-

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12144): https://lists.fd.io/g/vpp-dev/message/12144
Mute This Topic: https://lists.fd.io/mt/29539221/675748
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[nsax...@caviumnetworks.com<mailto:nsax...@caviumnetworks.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12146): https://lists.fd.io/g/vpp-dev/message/12146
Mute This Topic: https://lists.fd.io/mt/29539221/675642
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[dmar...@me.com<mailto:dmar...@me.com>]
-=-=-=-=-=-=-=-=-=-=-=-

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12147): https://lists.fd.io/g/vpp-dev/message/12147
Mute This Topic: https://lists.fd.io/mt/29539221/675748
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[nsax...@caviumnetworks.com<mailto:nsax...@caviumnetworks.com>]
-=-=-=-=-=-=-=-=-=-=-=-


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12156): https://lists.fd.io/g/vpp-dev/message/12156
Mute This Topic: https://lists.fd.io/mt/29651573/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] RFC: buffer manager rework

2019-02-03 Thread Damjan Marion via Lists.Fd.Io
gt;>>>> 
>>>>>  - first 64-bytes of metadata are initialised on free, so buffer alloc is 
>>>>> very fast
>>>>> Is it fair to say if a mempool is created per worker core per sw_index 
>>>>> (interface) then buffer template copy can be avoided even during free (It 
>>>>> can be done only once at init time)
>>>> 
>>>> The really expensive part of buffer free operation is bringing cacheline 
>>>> into L1, and we need to do that to verify reference count of the packet.
>>>> At the moment when data is in L1, simply copying template will not cost 
>>>> much. 1-2 clocks on x86, not sure about arm but still i expect that it 
>>>> will result in 4 128-bit stores.
>>>> That was the rationale for resetting the metadata during buffer free.
>>>> 
>>>> So to answer your question, having buffer per sw-interface will likely 
>>>> improve performance a bit, but it will also cause sub-optimal use of 
>>>> buffer memory.
>>>> Such solution will also have problem in scaling, for example if you have 
>>>> hundreds of virtual interfaces...
>>>> 
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Nitin
>>>>> 
>>>>> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> 
>>>>> mailto:vpp-dev@lists.fd.io>> on behalf of Damjan 
>>>>> Marion via Lists.Fd.Io >>>> <mailto:dmarion=me@lists.fd.io>>
>>>>> Sent: Friday, January 25, 2019 10:38 PM
>>>>> To: vpp-dev
>>>>> Cc: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>
>>>>> Subject: [vpp-dev] RFC: buffer manager rework
>>>>>  
>>>>> External Email
>>>>> 
>>>>> I am very close to the finish line with buffer management rework patch, 
>>>>> and would like to
>>>>> ask people to take a look before it is merged.
>>>>> 
>>>>> https://gerrit.fd.io/r/16638 <https://gerrit.fd.io/r/16638>
>>>>> 
>>>>> It significantly improves performance of buffer alloc free and introduces 
>>>>> numa awareness.
>>>>> On my skylake platinum 8180 system, with native AVF driver observed 
>>>>> performance improvement is:
>>>>> 
>>>>> - single core, 2 threads, ipv4 base forwarding test, CPU running at 
>>>>> 2.5GHz (TB off):
>>>>> 
>>>>> old code - dpdk buffer manager: 20.4 Mpps
>>>>> old code - old native buffer manager: 19.4 Mpps
>>>>> new code: 24.9 Mpps
>>>>> 
>>>>> With DPDK drivers performance stays same as DPDK is maintaining own 
>>>>> internal buffer cache.
>>>>> So major perf gain should be observed in native code like: vhost-user, 
>>>>> memif, AVF, host stack.
>>>>> 
>>>>> user facing changes:
>>>>> to change number of buffers:
>>>>>   old startup.conf:
>>>>> dpdk { num-mbufs  }
>>>>>   new startup.conf:
>>>>> buffers { buffers-per-numa }
>>>>> 
>>>>> Internal changes:
>>>>>  - free lists are deprecated
>>>>>  - buffer metadata is always initialised.
>>>>>  - first 64-bytes of metadata are initialised on free, so buffer alloc is 
>>>>> very fast
>>>>>  - DPDK mempools are not used anymore, we register custom mempool ops, 
>>>>> and dpdk is taking buffers from VPP
>>>>>  - to support such operation plugin can request external header space - 
>>>>> in case of DPDK it stores rte_mbuf + rte_mempool_objhdr
>>>>> 
>>>>> I'm still running some tests so possible minor changes are possible, but 
>>>>> nothing major expected.
>>>>> 
>>>>> --
>>>>> Damjan
>>>>> 
>>>>> -=-=-=-=-=-=-=-=-=-=-=-
>>>>> Links: You receive all messages sent to this group.
>>>>> 
>>>>> View/Reply Online (#12016): https://lists.fd.io/g/vpp-dev/message/12016 
>>>>> <https://lists.fd.io/g/vpp-dev/message/12016>
>>>>> Mute This Topic: https://lists.fd.io/mt/29539221/675748 
>>>>> <https://lists.fd.io/mt/29539221/675748>
>>>>> Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io>
>>>>

Re: [vpp-dev] RFC: buffer manager rework

2019-02-03 Thread Nitin Saxena
Hi Damjan,

See function octeontx_fpa_bufpool_alloc() called by octeontx_fpa_dequeue(). Its 
a single read instruction to get the pointer of data.
Similarly, octeontx_fpa_bufpool_free() is also a single write instruction.

So, If you are able to prove with numbers that current software solution is 
low-performant and that you are confident that you can do significantly better, 
I will be happy to work with you on implementing support for hardware buffer 
manager.
First of all I welcome your patch as we were also trying to remove latencies 
seen by memcpy_x4() of buffer template. As I said earlier hardware buffer 
coprocessor is being used by other packet engines hence the support has to be 
added in VPP. I am looking for suggestion for its resolution.

Thanks,
Nitin

On 03-Feb-2019, at 11:39 PM, Damjan Marion via Lists.Fd.Io 
mailto:dmarion=me@lists.fd.io>> wrote:


External Email


On 3 Feb 2019, at 18:38, Nitin Saxena 
mailto:nitin.sax...@cavium.com>> wrote:

Hi Damjan,

Which exact operation do they accelerate?
There are many…basic features are…
- they accelerate fast buffer free and alloc. Single instruction required for 
both operations.

I quickly looked into DPDK octeontx_fpavf_dequeue() and it looks to me much 
more than one instruction.

In case of DPDK, how that works with DPDK mempool cache or are you disabling 
mempool cache completely?

Does single instruction alloc/free include:
 - reference_count check and decrement?
 - user metadata initialization ?

- Free list is maintained by hardware and not software.

Sounds to me that it is slower to program hardware, than to simply add few 
buffer indices to the end of vector but I may be wrong...


Further other co-processors are dependent on buffer being managed by hardware 
instead of software so it is must to add support of hardware mem-pool in VPP. 
Software mempool will not work with other packet engines.

But that can also be handled internally by device driver...

So, If you are able to prove with numbers that current software solution is 
low-performant and that you are confident that you can do significantly better, 
I will be happy to work with you on implementing support for hardware buffer 
manager.


Thanks,
Nitin

On 03-Feb-2019, at 10:34 PM, Damjan Marion via Lists.Fd.Io 
mailto:dmarion=me@lists.fd.io>> wrote:


External Email


On 3 Feb 2019, at 16:58, Nitin Saxena 
mailto:nsax...@marvell.com>> wrote:

Hi Damjan,

I have few queries regarding this patch.

 - DPDK mempools are not used anymore, we register custom mempool ops, and dpdk 
is taking buffers from VPP
Some of the targets uses hardware memory allocator like OCTEONTx family and 
NXP's dpaa. Those hardware allocators are exposed as dpdk mempools.

Which exact operation do they accelerate?

Now with this change I can see rte_mempool_populate_iova() is not anymore 
called.

Yes, but new code does pretty much the same thing, it populates both elt_list 
and mem_list. Also new code puts IOVA into mempool_objhdr.

So what is your suggestion to support such hardware.

Before I can provide any suggestion I need to understand better what those 
hardware buffer managers do
and why they are better than pure software solution we have today.



 - first 64-bytes of metadata are initialised on free, so buffer alloc is very 
fast
Is it fair to say if a mempool is created per worker core per sw_index 
(interface) then buffer template copy can be avoided even during free (It can 
be done only once at init time)

The really expensive part of buffer free operation is bringing cacheline into 
L1, and we need to do that to verify reference count of the packet.
At the moment when data is in L1, simply copying template will not cost much. 
1-2 clocks on x86, not sure about arm but still i expect that it will result in 
4 128-bit stores.
That was the rationale for resetting the metadata during buffer free.

So to answer your question, having buffer per sw-interface will likely improve 
performance a bit, but it will also cause sub-optimal use of buffer memory.
Such solution will also have problem in scaling, for example if you have 
hundreds of virtual interfaces...



Thanks,
Nitin


From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> on behalf of Damjan Marion 
via Lists.Fd.Io mailto:dmarion=me@lists.fd.io>>
Sent: Friday, January 25, 2019 10:38 PM
To: vpp-dev
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] RFC: buffer manager rework

External Email

I am very close to the finish line with buffer management rework patch, and 
would like to
ask people to take a look before it is merged.

https://gerrit.fd.io/r/16638

It significantly improves performance of buffer alloc free and introduces numa 
awareness.
On my skylake platinum 8180 system, with native AVF driver observed performance 
improvement is:

- single core, 2 threads, ipv4 base forwarding 

Re: [vpp-dev] RFC: buffer manager rework

2019-02-03 Thread Damjan Marion via Lists.Fd.Io


> On 3 Feb 2019, at 18:38, Nitin Saxena  wrote:
> 
> Hi Damjan,
> 
>> Which exact operation do they accelerate?
> There are many…basic features are…
> - they accelerate fast buffer free and alloc. Single instruction required for 
> both operations. 

I quickly looked into DPDK octeontx_fpavf_dequeue() and it looks to me much 
more than one instruction.

In case of DPDK, how that works with DPDK mempool cache or are you disabling 
mempool cache completely?

Does single instruction alloc/free include:
 - reference_count check and decrement?
 - user metadata initialization ?

> - Free list is maintained by hardware and not software.  

Sounds to me that it is slower to program hardware, than to simply add few 
buffer indices to the end of vector but I may be wrong...

> 
> Further other co-processors are dependent on buffer being managed by hardware 
> instead of software so it is must to add support of hardware mem-pool in VPP. 
> Software mempool will not work with other packet engines.

But that can also be handled internally by device driver...

So, If you are able to prove with numbers that current software solution is 
low-performant and that you are confident that you can do significantly better, 
I will be happy to work with you on implementing support for hardware buffer 
manager.

> 
> Thanks,
> Nitin
> 
>> On 03-Feb-2019, at 10:34 PM, Damjan Marion via Lists.Fd.Io 
>> mailto:dmarion=me@lists.fd.io>> wrote:
>> 
>> External Email
>> 
>> 
>> 
>>> On 3 Feb 2019, at 16:58, Nitin Saxena >> <mailto:nsax...@marvell.com>> wrote:
>>> 
>>> Hi Damjan,
>>> 
>>> I have few queries regarding this patch.
>>> 
>>>  - DPDK mempools are not used anymore, we register custom mempool ops, and 
>>> dpdk is taking buffers from VPP
>>> Some of the targets uses hardware memory allocator like OCTEONTx family and 
>>> NXP's dpaa. Those hardware allocators are exposed as dpdk mempools.
>> 
>> Which exact operation do they accelerate?
>> 
>>> Now with this change I can see rte_mempool_populate_iova() is not anymore 
>>> called.
>> 
>> Yes, but new code does pretty much the same thing, it populates both 
>> elt_list and mem_list. Also new code puts IOVA into mempool_objhdr.
>> 
>>> So what is your suggestion to support such hardware.
>> 
>> Before I can provide any suggestion I need to understand better what those 
>> hardware buffer managers do
>> and why they are better than pure software solution we have today.
>> 
>>>  
>>> 
>>>  - first 64-bytes of metadata are initialised on free, so buffer alloc is 
>>> very fast
>>> Is it fair to say if a mempool is created per worker core per sw_index 
>>> (interface) then buffer template copy can be avoided even during free (It 
>>> can be done only once at init time)
>> 
>> The really expensive part of buffer free operation is bringing cacheline 
>> into L1, and we need to do that to verify reference count of the packet.
>> At the moment when data is in L1, simply copying template will not cost 
>> much. 1-2 clocks on x86, not sure about arm but still i expect that it will 
>> result in 4 128-bit stores.
>> That was the rationale for resetting the metadata during buffer free.
>> 
>> So to answer your question, having buffer per sw-interface will likely 
>> improve performance a bit, but it will also cause sub-optimal use of buffer 
>> memory.
>> Such solution will also have problem in scaling, for example if you have 
>> hundreds of virtual interfaces...
>> 
>> 
>>> 
>>> Thanks,
>>> Nitin
>>> 
>>> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> >> <mailto:vpp-dev@lists.fd.io>> on behalf of Damjan Marion via Lists.Fd.Io 
>>> mailto:dmarion=me@lists.fd.io>>
>>> Sent: Friday, January 25, 2019 10:38 PM
>>> To: vpp-dev
>>> Cc: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>
>>> Subject: [vpp-dev] RFC: buffer manager rework
>>>  
>>> External Email
>>> 
>>> I am very close to the finish line with buffer management rework patch, and 
>>> would like to
>>> ask people to take a look before it is merged.
>>> 
>>> https://gerrit.fd.io/r/16638 <https://gerrit.fd.io/r/16638>
>>> 
>>> It significantly improves performance of buffer alloc free and introduces 
>>> numa awareness.
>>> On my skylake platinum 8180 system, with native AVF driver observed 
>>> performance improvement is:
>&g

Re: [vpp-dev] RFC: buffer manager rework

2019-02-03 Thread Nitin Saxena
Hi Damjan,

Which exact operation do they accelerate?
There are many…basic features are…
- they accelerate fast buffer free and alloc. Single instruction required for 
both operations.
- Free list is maintained by hardware and not software.

Further other co-processors are dependent on buffer being managed by hardware 
instead of software so it is must to add support of hardware mem-pool in VPP. 
Software mempool will not work with other packet engines.

Thanks,
Nitin

On 03-Feb-2019, at 10:34 PM, Damjan Marion via Lists.Fd.Io 
mailto:dmarion=me@lists.fd.io>> wrote:


External Email


On 3 Feb 2019, at 16:58, Nitin Saxena 
mailto:nsax...@marvell.com>> wrote:

Hi Damjan,

I have few queries regarding this patch.

 - DPDK mempools are not used anymore, we register custom mempool ops, and dpdk 
is taking buffers from VPP
Some of the targets uses hardware memory allocator like OCTEONTx family and 
NXP's dpaa. Those hardware allocators are exposed as dpdk mempools.

Which exact operation do they accelerate?

Now with this change I can see rte_mempool_populate_iova() is not anymore 
called.

Yes, but new code does pretty much the same thing, it populates both elt_list 
and mem_list. Also new code puts IOVA into mempool_objhdr.

So what is your suggestion to support such hardware.

Before I can provide any suggestion I need to understand better what those 
hardware buffer managers do
and why they are better than pure software solution we have today.



 - first 64-bytes of metadata are initialised on free, so buffer alloc is very 
fast
Is it fair to say if a mempool is created per worker core per sw_index 
(interface) then buffer template copy can be avoided even during free (It can 
be done only once at init time)

The really expensive part of buffer free operation is bringing cacheline into 
L1, and we need to do that to verify reference count of the packet.
At the moment when data is in L1, simply copying template will not cost much. 
1-2 clocks on x86, not sure about arm but still i expect that it will result in 
4 128-bit stores.
That was the rationale for resetting the metadata during buffer free.

So to answer your question, having buffer per sw-interface will likely improve 
performance a bit, but it will also cause sub-optimal use of buffer memory.
Such solution will also have problem in scaling, for example if you have 
hundreds of virtual interfaces...



Thanks,
Nitin


From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> on behalf of Damjan Marion 
via Lists.Fd.Io mailto:dmarion=me@lists.fd.io>>
Sent: Friday, January 25, 2019 10:38 PM
To: vpp-dev
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] RFC: buffer manager rework

External Email

I am very close to the finish line with buffer management rework patch, and 
would like to
ask people to take a look before it is merged.

https://gerrit.fd.io/r/16638

It significantly improves performance of buffer alloc free and introduces numa 
awareness.
On my skylake platinum 8180 system, with native AVF driver observed performance 
improvement is:

- single core, 2 threads, ipv4 base forwarding test, CPU running at 2.5GHz (TB 
off):

old code - dpdk buffer manager: 20.4 Mpps
old code - old native buffer manager: 19.4 Mpps
new code: 24.9 Mpps

With DPDK drivers performance stays same as DPDK is maintaining own internal 
buffer cache.
So major perf gain should be observed in native code like: vhost-user, memif, 
AVF, host stack.

user facing changes:
to change number of buffers:
  old startup.conf:
dpdk { num-mbufs  }
  new startup.conf:
buffers { buffers-per-numa }

Internal changes:
 - free lists are deprecated
 - buffer metadata is always initialised.
 - first 64-bytes of metadata are initialised on free, so buffer alloc is very 
fast
 - DPDK mempools are not used anymore, we register custom mempool ops, and dpdk 
is taking buffers from VPP
 - to support such operation plugin can request external header space - in case 
of DPDK it stores rte_mbuf + rte_mempool_objhdr

I'm still running some tests so possible minor changes are possible, but 
nothing major expected.

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12016): https://lists.fd.io/g/vpp-dev/message/12016
Mute This Topic: https://lists.fd.io/mt/29539221/675748
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[nsax...@caviumnetworks.com<mailto:nsax...@caviumnetworks.com>]
-=-=-=-=-=-=-=-=-=-=-=-

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12144): https://lists.fd.io/g/vpp-dev/message/12144
Mute This Topic: https://lists.fd.io/mt/29539221/675748
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lis

Re: [vpp-dev] RFC: buffer manager rework

2019-02-03 Thread Nitin Saxena
Hi Damjan,


I have few queries regarding this patch.


 - DPDK mempools are not used anymore, we register custom mempool ops, and dpdk 
is taking buffers from VPP

Some of the targets uses hardware memory allocator like OCTEONTx family and 
NXP's dpaa. Those hardware allocators are exposed as dpdk mempools. Now with 
this change I can see rte_mempool_populate_iova() is not anymore called. So 
what is your suggestion to support such hardware.

 - first 64-bytes of metadata are initialised on free, so buffer alloc is very 
fast
Is it fair to say if a mempool is created per worker core per sw_index 
(interface) then buffer template copy can be avoided even during free (It can 
be done only once at init time)

Thanks,
Nitin



From: vpp-dev@lists.fd.io  on behalf of Damjan Marion via 
Lists.Fd.Io 
Sent: Friday, January 25, 2019 10:38 PM
To: vpp-dev
Cc: vpp-dev@lists.fd.io
Subject: [vpp-dev] RFC: buffer manager rework

External Email

I am very close to the finish line with buffer management rework patch, and 
would like to
ask people to take a look before it is merged.

https://gerrit.fd.io/r/16638

It significantly improves performance of buffer alloc free and introduces numa 
awareness.
On my skylake platinum 8180 system, with native AVF driver observed performance 
improvement is:

- single core, 2 threads, ipv4 base forwarding test, CPU running at 2.5GHz (TB 
off):

old code - dpdk buffer manager: 20.4 Mpps
old code - old native buffer manager: 19.4 Mpps
new code: 24.9 Mpps

With DPDK drivers performance stays same as DPDK is maintaining own internal 
buffer cache.
So major perf gain should be observed in native code like: vhost-user, memif, 
AVF, host stack.

user facing changes:
to change number of buffers:
  old startup.conf:
dpdk { num-mbufs  }
  new startup.conf:
buffers { buffers-per-numa }

Internal changes:
 - free lists are deprecated
 - buffer metadata is always initialised.
 - first 64-bytes of metadata are initialised on free, so buffer alloc is very 
fast
 - DPDK mempools are not used anymore, we register custom mempool ops, and dpdk 
is taking buffers from VPP
 - to support such operation plugin can request external header space - in case 
of DPDK it stores rte_mbuf + rte_mempool_objhdr

I'm still running some tests so possible minor changes are possible, but 
nothing major expected.

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12016): https://lists.fd.io/g/vpp-dev/message/12016
Mute This Topic: https://lists.fd.io/mt/29539221/675748
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [nsax...@caviumnetworks.com]
-=-=-=-=-=-=-=-=-=-=-=-
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12145): https://lists.fd.io/g/vpp-dev/message/12145
Mute This Topic: https://lists.fd.io/mt/29539221/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] RFC: buffer manager rework

2019-02-03 Thread Damjan Marion via Lists.Fd.Io


> On 3 Feb 2019, at 16:58, Nitin Saxena  wrote:
> 
> Hi Damjan,
> 
> I have few queries regarding this patch.
> 
>  - DPDK mempools are not used anymore, we register custom mempool ops, and 
> dpdk is taking buffers from VPP
> Some of the targets uses hardware memory allocator like OCTEONTx family and 
> NXP's dpaa. Those hardware allocators are exposed as dpdk mempools.

Which exact operation do they accelerate?

> Now with this change I can see rte_mempool_populate_iova() is not anymore 
> called.

Yes, but new code does pretty much the same thing, it populates both elt_list 
and mem_list. Also new code puts IOVA into mempool_objhdr.

> So what is your suggestion to support such hardware.

Before I can provide any suggestion I need to understand better what those 
hardware buffer managers do
and why they are better than pure software solution we have today.

>  
> 
>  - first 64-bytes of metadata are initialised on free, so buffer alloc is 
> very fast
> Is it fair to say if a mempool is created per worker core per sw_index 
> (interface) then buffer template copy can be avoided even during free (It can 
> be done only once at init time)

The really expensive part of buffer free operation is bringing cacheline into 
L1, and we need to do that to verify reference count of the packet.
At the moment when data is in L1, simply copying template will not cost much. 
1-2 clocks on x86, not sure about arm but still i expect that it will result in 
4 128-bit stores.
That was the rationale for resetting the metadata during buffer free.

So to answer your question, having buffer per sw-interface will likely improve 
performance a bit, but it will also cause sub-optimal use of buffer memory.
Such solution will also have problem in scaling, for example if you have 
hundreds of virtual interfaces...


> 
> Thanks,
> Nitin
> 
> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>  <mailto:vpp-dev@lists.fd.io>> on behalf of Damjan Marion via Lists.Fd.Io 
> mailto:dmarion=me@lists.fd.io>>
> Sent: Friday, January 25, 2019 10:38 PM
> To: vpp-dev
> Cc: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>
> Subject: [vpp-dev] RFC: buffer manager rework
>  
> External Email
> 
> I am very close to the finish line with buffer management rework patch, and 
> would like to
> ask people to take a look before it is merged.
> 
> https://gerrit.fd.io/r/16638 <https://gerrit.fd.io/r/16638>
> 
> It significantly improves performance of buffer alloc free and introduces 
> numa awareness.
> On my skylake platinum 8180 system, with native AVF driver observed 
> performance improvement is:
> 
> - single core, 2 threads, ipv4 base forwarding test, CPU running at 2.5GHz 
> (TB off):
> 
> old code - dpdk buffer manager: 20.4 Mpps
> old code - old native buffer manager: 19.4 Mpps
> new code: 24.9 Mpps
> 
> With DPDK drivers performance stays same as DPDK is maintaining own internal 
> buffer cache.
> So major perf gain should be observed in native code like: vhost-user, memif, 
> AVF, host stack.
> 
> user facing changes:
> to change number of buffers:
>   old startup.conf:
> dpdk { num-mbufs  }
>   new startup.conf:
> buffers { buffers-per-numa }
> 
> Internal changes:
>  - free lists are deprecated
>  - buffer metadata is always initialised.
>  - first 64-bytes of metadata are initialised on free, so buffer alloc is 
> very fast
>  - DPDK mempools are not used anymore, we register custom mempool ops, and 
> dpdk is taking buffers from VPP
>  - to support such operation plugin can request external header space - in 
> case of DPDK it stores rte_mbuf + rte_mempool_objhdr
> 
> I'm still running some tests so possible minor changes are possible, but 
> nothing major expected.
> 
> --
> Damjan
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#12016): https://lists.fd.io/g/vpp-dev/message/12016 
> <https://lists.fd.io/g/vpp-dev/message/12016>
> Mute This Topic: https://lists.fd.io/mt/29539221/675748 
> <https://lists.fd.io/mt/29539221/675748>
> Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io>
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub 
> <https://lists.fd.io/g/vpp-dev/unsub>  [nsax...@caviumnetworks.com 
> <mailto:nsax...@caviumnetworks.com>]
> -=-=-=-=-=-=-=-=-=-=-=-

-- 
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12144): https://lists.fd.io/g/vpp-dev/message/12144
Mute This Topic: https://lists.fd.io/mt/29539221/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] RFC: buffer manager rework

2019-01-25 Thread Florin Coras
Awesome! Looking forward to using it in the host stack ;-)

Florin

> On Jan 25, 2019, at 9:08 AM, Damjan Marion via Lists.Fd.Io 
>  wrote:
> 
> 
> I am very close to the finish line with buffer management rework patch, and 
> would like to 
> ask people to take a look before it is merged.
> 
> https://gerrit.fd.io/r/16638
> 
> It significantly improves performance of buffer alloc free and introduces 
> numa awareness.
> On my skylake platinum 8180 system, with native AVF driver observed 
> performance improvement is:
> 
> - single core, 2 threads, ipv4 base forwarding test, CPU running at 2.5GHz 
> (TB off):
> 
> old code - dpdk buffer manager: 20.4 Mpps
> old code - old native buffer manager: 19.4 Mpps
> new code: 24.9 Mpps
> 
> With DPDK drivers performance stays same as DPDK is maintaining own internal 
> buffer cache. 
> So major perf gain should be observed in native code like: vhost-user, memif, 
> AVF, host stack.
> 
> user facing changes:
> to change number of buffers:
>  old startup.conf:
>dpdk { num-mbufs  } 
>  new startup.conf:
>buffers { buffers-per-numa }
> 
> Internal changes:
> - free lists are deprecated
> - buffer metadata is always initialised.
> - first 64-bytes of metadata are initialised on free, so buffer alloc is very 
> fast
> - DPDK mempools are not used anymore, we register custom mempool ops, and 
> dpdk is taking buffers from VPP
> - to support such operation plugin can request external header space - in 
> case of DPDK it stores rte_mbuf + rte_mempool_objhdr
> 
> I'm still running some tests so possible minor changes are possible, but 
> nothing major expected.
> 
> -- 
> Damjan
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#12016): https://lists.fd.io/g/vpp-dev/message/12016
> Mute This Topic: https://lists.fd.io/mt/29539221/675152
> Group Owner: vpp-dev+ow...@lists.fd.io
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [fcoras.li...@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12018): https://lists.fd.io/g/vpp-dev/message/12018
Mute This Topic: https://lists.fd.io/mt/29539221/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] RFC: buffer manager rework

2019-01-25 Thread Damjan Marion via Lists.Fd.Io

I am very close to the finish line with buffer management rework patch, and 
would like to 
ask people to take a look before it is merged.

https://gerrit.fd.io/r/16638

It significantly improves performance of buffer alloc free and introduces numa 
awareness.
On my skylake platinum 8180 system, with native AVF driver observed performance 
improvement is:

- single core, 2 threads, ipv4 base forwarding test, CPU running at 2.5GHz (TB 
off):

old code - dpdk buffer manager: 20.4 Mpps
old code - old native buffer manager: 19.4 Mpps
new code: 24.9 Mpps

With DPDK drivers performance stays same as DPDK is maintaining own internal 
buffer cache. 
So major perf gain should be observed in native code like: vhost-user, memif, 
AVF, host stack.

user facing changes:
to change number of buffers:
  old startup.conf:
dpdk { num-mbufs  } 
  new startup.conf:
buffers { buffers-per-numa }

Internal changes:
 - free lists are deprecated
 - buffer metadata is always initialised.
 - first 64-bytes of metadata are initialised on free, so buffer alloc is very 
fast
 - DPDK mempools are not used anymore, we register custom mempool ops, and dpdk 
is taking buffers from VPP
 - to support such operation plugin can request external header space - in case 
of DPDK it stores rte_mbuf + rte_mempool_objhdr

I'm still running some tests so possible minor changes are possible, but 
nothing major expected.

-- 
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12016): https://lists.fd.io/g/vpp-dev/message/12016
Mute This Topic: https://lists.fd.io/mt/29539221/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-