Re: [ovs-dev] [PATCH v4] dp-packet: Allow DPDK packet resize.
On Fri, 6 Mar 2026 at 16:42, Ilya Maximets wrote: > >> And I wonder if it's maybe worth doing 3 levels of encaps instead of 2, > >> just to be sure. WDYT? > > > > Having 2 levels was hard enough to my brain and two (IPv6) vxlan > > encapsulations are 140 bytes. > > > > What are you scared of? > > I just think that over-doing this test is safer than trying to guess the > number. The headroom is configurable in DPDK and the default can also > potentially change, in which case the test will not check what it supposed > to check. It seems more that a check is missing on this side, rather than going to a 3rd level of tunnels. I asked Claude to help me on the autoconf aspect, and it seems to work (I checked with a 64 bytes headroom). I will add this in the next revision. -- David Marchand ___ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4] dp-packet: Allow DPDK packet resize.
On 3/6/26 3:50 PM, David Marchand wrote:
> On Thu, 5 Mar 2026 at 00:04, Ilya Maximets wrote:
>>
>> On 2/25/26 12:03 PM, David Marchand via dev wrote:
>>> By default, DPDK based dp-packets points to data buffers that can't be
>>> expanded dynamically.
>>> Their layout is as follows:
>>> - a minimum 128 bytes headroom chosen at DPDK build time
>>> (RTE_PKTMBUF_HEADROOM),
>>> - a maximum size chosen at mempool creation,
>>>
>>> In some usecases though (like encapsulating with multiple tunnels),
>>> a 128 bytes headroom is too short.
>>>
>>> Keep on using mono segment packets but dynamically allocate buffers
>>> in DPDK memory and make use of DPDK external buffers API
>>> (previously used for userspace TSO).
>>>
>>> Signed-off-by: David Marchand
>>> ---
>>> Changes since v3:
>>> - split buffer length calculation in a helper,
>>> - handled running test without qdisc (net/tap does not require
>>> those qdiscs, but spews ERR level logs if absent),
>>> - added check on firewall,
>>>
>>> Changes since v2:
>>> - moved check on uint16_t overflow in netdev_dpdk_extbuf_allocate(),
>>>
>>> Changes since v1:
>>> - fixed new segment length (reset by extbuf attach helper),
>>> - added a system-dpdk unit test,
>>>
>>> ---
>>> lib/dp-packet.c | 17 +++-
>>> lib/netdev-dpdk.c| 47
>>> lib/netdev-dpdk.h| 4 +++
>>> tests/system-dpdk.at | 65
>>> 4 files changed, 127 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>>> index c04d608be6..4c45636039 100644
>>> --- a/lib/dp-packet.c
>>> +++ b/lib/dp-packet.c
>>> @@ -255,8 +255,23 @@ dp_packet_resize(struct dp_packet *b, size_t
>>> new_headroom, size_t new_tailroom)
>>> new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
>>>
>>> switch (b->source) {
>>> -case DPBUF_DPDK:
>>> +case DPBUF_DPDK: {
>>> +#ifdef DPDK_NETDEV
>>> +uint32_t buf_len;
>>> +
>>> +buf_len = netdev_dpdk_extbuf_size(new_allocated);
>>
>> Shouldn't we assign into new_allocated here? The rte_pktmbuf_attach_extbuf()
>> will update the mbuf->buf_len to the result of this call. However, there
>> is the dp_packet_set_allocated(b, new_allocated); call that will overwrite
>> that value with a potentially smaller 'new_allocated'. I'm not sure if that
>> can cause any issues, since the value is smaller, but it doesn't feel right.
>>
>> Or am I missing something here?
>
> rte_pktmbuf_ext_shinfo_init_helper() stores the struct
> rte_mbuf_ext_shared_info object after the actual data.
> The computed size here under the name buf_len accounts for the length
> that OVS wants + some space for the rte_mbuf_ext_shared_info object
> and some alignment constraint.
> We can't adjust new_allocated to this value.
> I can probably rename the variable.
>
> Now, you are right that OVS may set a smaller value in buf_len later,
> because of dp_packet_set_allocated().
> I don't think it would be an issue, just that we would be wasting some
> space that was allocated because of alignment.
>
> Continuation below...
>
>>
>>> +ovs_assert(buf_len <= UINT16_MAX);
>>> +new_base = netdev_dpdk_extbuf_allocate(buf_len);
>>> +if (!new_base) {
>>> +out_of_memory();
>>> +}
>>> +dp_packet_copy__(b, new_base, new_headroom, new_tailroom);
>>> +netdev_dpdk_extbuf_replace(b, new_base, buf_len);
>
> ... new_allocated can be adjusted here to pkt->mbuf.buf_len after call
> to netdev_dpdk_extbuf_replace() since rte_pktmbuf_attach_extbuf stores
> the right available length in mbuf->buf_len.
>
> It is too late to adjust new_tailroom that was used to copy existing
> data, but is it an issue?
The tailroom is not a real value, it's always calculated from the
allocated size and the base. And the size is already large enough
to fit the requested, so it's fine that it's not fully correct at
the moment of copy. But it feels a little icky if we overwrite the
buf_len in the mbuf with a lower value, so we should probbaly get
the actual length, just to avoid any weird issues in the future.
Though I agree that it shouldn't be a big problem beside the wasted
bit of memory.
> (btw, why copy head/tailroom data? those locations should be treated
> as uninitialized data from my pov)
Just to be safe, I guess. The API sort of allows for reserving the
space, so someone might put something in there before realizing that
they need more? I'm not sure if there is any code that does this,
but it might be possible.
>
>
>>> +break;
>>> +#else
>>> OVS_NOT_REACHED();
>>> +#endif
>>> +}
>>>
>>> case DPBUF_MALLOC:
>>> if (new_headroom == dp_packet_headroom(b)) {
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 923191da84..cfd641b493 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -3072,12 +3072,51 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk
>>> *dev, s
Re: [ovs-dev] [PATCH v4] dp-packet: Allow DPDK packet resize.
On Thu, 5 Mar 2026 at 00:04, Ilya Maximets wrote:
>
> On 2/25/26 12:03 PM, David Marchand via dev wrote:
> > By default, DPDK based dp-packets points to data buffers that can't be
> > expanded dynamically.
> > Their layout is as follows:
> > - a minimum 128 bytes headroom chosen at DPDK build time
> > (RTE_PKTMBUF_HEADROOM),
> > - a maximum size chosen at mempool creation,
> >
> > In some usecases though (like encapsulating with multiple tunnels),
> > a 128 bytes headroom is too short.
> >
> > Keep on using mono segment packets but dynamically allocate buffers
> > in DPDK memory and make use of DPDK external buffers API
> > (previously used for userspace TSO).
> >
> > Signed-off-by: David Marchand
> > ---
> > Changes since v3:
> > - split buffer length calculation in a helper,
> > - handled running test without qdisc (net/tap does not require
> > those qdiscs, but spews ERR level logs if absent),
> > - added check on firewall,
> >
> > Changes since v2:
> > - moved check on uint16_t overflow in netdev_dpdk_extbuf_allocate(),
> >
> > Changes since v1:
> > - fixed new segment length (reset by extbuf attach helper),
> > - added a system-dpdk unit test,
> >
> > ---
> > lib/dp-packet.c | 17 +++-
> > lib/netdev-dpdk.c| 47
> > lib/netdev-dpdk.h| 4 +++
> > tests/system-dpdk.at | 65
> > 4 files changed, 127 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> > index c04d608be6..4c45636039 100644
> > --- a/lib/dp-packet.c
> > +++ b/lib/dp-packet.c
> > @@ -255,8 +255,23 @@ dp_packet_resize(struct dp_packet *b, size_t
> > new_headroom, size_t new_tailroom)
> > new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
> >
> > switch (b->source) {
> > -case DPBUF_DPDK:
> > +case DPBUF_DPDK: {
> > +#ifdef DPDK_NETDEV
> > +uint32_t buf_len;
> > +
> > +buf_len = netdev_dpdk_extbuf_size(new_allocated);
>
> Shouldn't we assign into new_allocated here? The rte_pktmbuf_attach_extbuf()
> will update the mbuf->buf_len to the result of this call. However, there
> is the dp_packet_set_allocated(b, new_allocated); call that will overwrite
> that value with a potentially smaller 'new_allocated'. I'm not sure if that
> can cause any issues, since the value is smaller, but it doesn't feel right.
>
> Or am I missing something here?
rte_pktmbuf_ext_shinfo_init_helper() stores the struct
rte_mbuf_ext_shared_info object after the actual data.
The computed size here under the name buf_len accounts for the length
that OVS wants + some space for the rte_mbuf_ext_shared_info object
and some alignment constraint.
We can't adjust new_allocated to this value.
I can probably rename the variable.
Now, you are right that OVS may set a smaller value in buf_len later,
because of dp_packet_set_allocated().
I don't think it would be an issue, just that we would be wasting some
space that was allocated because of alignment.
Continuation below...
>
> > +ovs_assert(buf_len <= UINT16_MAX);
> > +new_base = netdev_dpdk_extbuf_allocate(buf_len);
> > +if (!new_base) {
> > +out_of_memory();
> > +}
> > +dp_packet_copy__(b, new_base, new_headroom, new_tailroom);
> > +netdev_dpdk_extbuf_replace(b, new_base, buf_len);
... new_allocated can be adjusted here to pkt->mbuf.buf_len after call
to netdev_dpdk_extbuf_replace() since rte_pktmbuf_attach_extbuf stores
the right available length in mbuf->buf_len.
It is too late to adjust new_tailroom that was used to copy existing
data, but is it an issue?
(btw, why copy head/tailroom data? those locations should be treated
as uninitialized data from my pov)
> > +break;
> > +#else
> > OVS_NOT_REACHED();
> > +#endif
> > +}
> >
> > case DPBUF_MALLOC:
> > if (new_headroom == dp_packet_headroom(b)) {
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 923191da84..cfd641b493 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -3072,12 +3072,51 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk
> > *dev, struct rte_mbuf **pkts,
> > return cnt;
> > }
> >
> > +uint32_t
> > +netdev_dpdk_extbuf_size(uint32_t data_len)
> > +{
> > +uint32_t buf_len = data_len;
> > +
> > +buf_len += sizeof(struct rte_mbuf_ext_shared_info) + sizeof(uintptr_t);
> > +buf_len = RTE_ALIGN_CEIL(buf_len, sizeof(uintptr_t));
> > +
> > +return buf_len;
> > +}
> > +
> > +void *
> > +netdev_dpdk_extbuf_allocate(uint32_t buf_len)
> > +{
> > +return rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
> > +}
> > +
> > static void
> > netdev_dpdk_extbuf_free(void *addr OVS_UNUSED, void *opaque)
> > {
> > rte_free(opaque);
> > }
> >
> > +void
> > +netdev_dpdk_extbuf_replace(struct dp_packet *b, void *buf, uint32_t
> > data_len)
> > +{
> > +struct rte_mbuf *pkt = (struct rte_mbuf *) b;
> > +struct rte_mbuf_ext_sha
Re: [ovs-dev] [PATCH v4] dp-packet: Allow DPDK packet resize.
On 3/5/26 12:09 AM, Ilya Maximets wrote:
> On 3/5/26 12:04 AM, Ilya Maximets wrote:
>> On 2/25/26 12:03 PM, David Marchand via dev wrote:
>>> By default, DPDK based dp-packets points to data buffers that can't be
>>> expanded dynamically.
>>> Their layout is as follows:
>>> - a minimum 128 bytes headroom chosen at DPDK build time
>>> (RTE_PKTMBUF_HEADROOM),
>>> - a maximum size chosen at mempool creation,
>>>
>>> In some usecases though (like encapsulating with multiple tunnels),
>>> a 128 bytes headroom is too short.
>>>
>>> Keep on using mono segment packets but dynamically allocate buffers
>>> in DPDK memory and make use of DPDK external buffers API
>>> (previously used for userspace TSO).
>>>
>>> Signed-off-by: David Marchand
>>> ---
>>> Changes since v3:
>>> - split buffer length calculation in a helper,
>>> - handled running test without qdisc (net/tap does not require
>>> those qdiscs, but spews ERR level logs if absent),
>>> - added check on firewall,
>>>
>>> Changes since v2:
>>> - moved check on uint16_t overflow in netdev_dpdk_extbuf_allocate(),
>>>
>>> Changes since v1:
>>> - fixed new segment length (reset by extbuf attach helper),
>>> - added a system-dpdk unit test,
>>>
>>> ---
>>> lib/dp-packet.c | 17 +++-
>>> lib/netdev-dpdk.c| 47
>>> lib/netdev-dpdk.h| 4 +++
>>> tests/system-dpdk.at | 65
>>> 4 files changed, 127 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>>> index c04d608be6..4c45636039 100644
>>> --- a/lib/dp-packet.c
>>> +++ b/lib/dp-packet.c
>>> @@ -255,8 +255,23 @@ dp_packet_resize(struct dp_packet *b, size_t
>>> new_headroom, size_t new_tailroom)
>>> new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
>>>
>>> switch (b->source) {
>>> -case DPBUF_DPDK:
>>> +case DPBUF_DPDK: {
>>> +#ifdef DPDK_NETDEV
>>> +uint32_t buf_len;
>>> +
>>> +buf_len = netdev_dpdk_extbuf_size(new_allocated);
>>
>> Shouldn't we assign into new_allocated here? The rte_pktmbuf_attach_extbuf()
>> will update the mbuf->buf_len to the result of this call. However, there
>> is the dp_packet_set_allocated(b, new_allocated); call that will overwrite
>> that value with a potentially smaller 'new_allocated'. I'm not sure if that
>> can cause any issues, since the value is smaller, but it doesn't feel right.
>>
>
> May also need to adjust the new_tailroom.
>
>> Or am I missing something here?
>>
>>> +ovs_assert(buf_len <= UINT16_MAX);
>>> +new_base = netdev_dpdk_extbuf_allocate(buf_len);
>>> +if (!new_base) {
>>> +out_of_memory();
>>> +}
>>> +dp_packet_copy__(b, new_base, new_headroom, new_tailroom);
>>> +netdev_dpdk_extbuf_replace(b, new_base, buf_len);
>>> +break;
>>> +#else
>>> OVS_NOT_REACHED();
>>> +#endif
>>> +}
>>>
>>> case DPBUF_MALLOC:
>>> if (new_headroom == dp_packet_headroom(b)) {
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 923191da84..cfd641b493 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -3072,12 +3072,51 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk
>>> *dev, struct rte_mbuf **pkts,
>>> return cnt;
>>> }
>>>
>>> +uint32_t
>>> +netdev_dpdk_extbuf_size(uint32_t data_len)
>>> +{
>>> +uint32_t buf_len = data_len;
>>> +
>>> +buf_len += sizeof(struct rte_mbuf_ext_shared_info) + sizeof(uintptr_t);
>>> +buf_len = RTE_ALIGN_CEIL(buf_len, sizeof(uintptr_t));
>>> +
>>> +return buf_len;
>>> +}
>>> +
>>> +void *
>>> +netdev_dpdk_extbuf_allocate(uint32_t buf_len)
>>> +{
>>> +return rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
>>> +}
>>> +
>>> static void
>>> netdev_dpdk_extbuf_free(void *addr OVS_UNUSED, void *opaque)
>>> {
>>> rte_free(opaque);
>>> }
>>>
>>> +void
>>> +netdev_dpdk_extbuf_replace(struct dp_packet *b, void *buf, uint32_t
>>> data_len)
>>> +{
>>> +struct rte_mbuf *pkt = (struct rte_mbuf *) b;
>>> +struct rte_mbuf_ext_shared_info *shinfo;
>>> +uint16_t buf_len = data_len;
>>> +
>>> +shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len,
>>> +netdev_dpdk_extbuf_free,
>>> +buf);
>>> +ovs_assert(shinfo != NULL);
>>> +
>>> +if (RTE_MBUF_HAS_EXTBUF(pkt)) {
>>> +rte_pktmbuf_detach_extbuf(pkt);
>>> +}
>>> +rte_pktmbuf_attach_extbuf(pkt, buf, rte_malloc_virt2iova(buf), buf_len,
>>> + shinfo);
>>> +/* OVS only supports mono segment.
>>> + * Packet size did not change, restore the current segment length. */
>>> +pkt->data_len = pkt->pkt_len;
>>> +}
>>> +
>>> static struct rte_mbuf *
>>> dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt, uint32_t data_len)
>>> {
>>> @@ -3086,16 +3125,14 @@ dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt,
>>> uint
Re: [ovs-dev] [PATCH v4] dp-packet: Allow DPDK packet resize.
On 3/5/26 12:04 AM, Ilya Maximets wrote:
> On 2/25/26 12:03 PM, David Marchand via dev wrote:
>> By default, DPDK based dp-packets points to data buffers that can't be
>> expanded dynamically.
>> Their layout is as follows:
>> - a minimum 128 bytes headroom chosen at DPDK build time
>> (RTE_PKTMBUF_HEADROOM),
>> - a maximum size chosen at mempool creation,
>>
>> In some usecases though (like encapsulating with multiple tunnels),
>> a 128 bytes headroom is too short.
>>
>> Keep on using mono segment packets but dynamically allocate buffers
>> in DPDK memory and make use of DPDK external buffers API
>> (previously used for userspace TSO).
>>
>> Signed-off-by: David Marchand
>> ---
>> Changes since v3:
>> - split buffer length calculation in a helper,
>> - handled running test without qdisc (net/tap does not require
>> those qdiscs, but spews ERR level logs if absent),
>> - added check on firewall,
>>
>> Changes since v2:
>> - moved check on uint16_t overflow in netdev_dpdk_extbuf_allocate(),
>>
>> Changes since v1:
>> - fixed new segment length (reset by extbuf attach helper),
>> - added a system-dpdk unit test,
>>
>> ---
>> lib/dp-packet.c | 17 +++-
>> lib/netdev-dpdk.c| 47
>> lib/netdev-dpdk.h| 4 +++
>> tests/system-dpdk.at | 65
>> 4 files changed, 127 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>> index c04d608be6..4c45636039 100644
>> --- a/lib/dp-packet.c
>> +++ b/lib/dp-packet.c
>> @@ -255,8 +255,23 @@ dp_packet_resize(struct dp_packet *b, size_t
>> new_headroom, size_t new_tailroom)
>> new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
>>
>> switch (b->source) {
>> -case DPBUF_DPDK:
>> +case DPBUF_DPDK: {
>> +#ifdef DPDK_NETDEV
>> +uint32_t buf_len;
>> +
>> +buf_len = netdev_dpdk_extbuf_size(new_allocated);
>
> Shouldn't we assign into new_allocated here? The rte_pktmbuf_attach_extbuf()
> will update the mbuf->buf_len to the result of this call. However, there
> is the dp_packet_set_allocated(b, new_allocated); call that will overwrite
> that value with a potentially smaller 'new_allocated'. I'm not sure if that
> can cause any issues, since the value is smaller, but it doesn't feel right.
>
May also need to adjust the new_tailroom.
> Or am I missing something here?
>
>> +ovs_assert(buf_len <= UINT16_MAX);
>> +new_base = netdev_dpdk_extbuf_allocate(buf_len);
>> +if (!new_base) {
>> +out_of_memory();
>> +}
>> +dp_packet_copy__(b, new_base, new_headroom, new_tailroom);
>> +netdev_dpdk_extbuf_replace(b, new_base, buf_len);
>> +break;
>> +#else
>> OVS_NOT_REACHED();
>> +#endif
>> +}
>>
>> case DPBUF_MALLOC:
>> if (new_headroom == dp_packet_headroom(b)) {
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 923191da84..cfd641b493 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -3072,12 +3072,51 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk
>> *dev, struct rte_mbuf **pkts,
>> return cnt;
>> }
>>
>> +uint32_t
>> +netdev_dpdk_extbuf_size(uint32_t data_len)
>> +{
>> +uint32_t buf_len = data_len;
>> +
>> +buf_len += sizeof(struct rte_mbuf_ext_shared_info) + sizeof(uintptr_t);
>> +buf_len = RTE_ALIGN_CEIL(buf_len, sizeof(uintptr_t));
>> +
>> +return buf_len;
>> +}
>> +
>> +void *
>> +netdev_dpdk_extbuf_allocate(uint32_t buf_len)
>> +{
>> +return rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
>> +}
>> +
>> static void
>> netdev_dpdk_extbuf_free(void *addr OVS_UNUSED, void *opaque)
>> {
>> rte_free(opaque);
>> }
>>
>> +void
>> +netdev_dpdk_extbuf_replace(struct dp_packet *b, void *buf, uint32_t
>> data_len)
>> +{
>> +struct rte_mbuf *pkt = (struct rte_mbuf *) b;
>> +struct rte_mbuf_ext_shared_info *shinfo;
>> +uint16_t buf_len = data_len;
>> +
>> +shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len,
>> +netdev_dpdk_extbuf_free,
>> +buf);
>> +ovs_assert(shinfo != NULL);
>> +
>> +if (RTE_MBUF_HAS_EXTBUF(pkt)) {
>> +rte_pktmbuf_detach_extbuf(pkt);
>> +}
>> +rte_pktmbuf_attach_extbuf(pkt, buf, rte_malloc_virt2iova(buf), buf_len,
>> + shinfo);
>> +/* OVS only supports mono segment.
>> + * Packet size did not change, restore the current segment length. */
>> +pkt->data_len = pkt->pkt_len;
>> +}
>> +
>> static struct rte_mbuf *
>> dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt, uint32_t data_len)
>> {
>> @@ -3086,16 +3125,14 @@ dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt,
>> uint32_t data_len)
>> uint16_t buf_len;
>> void *buf;
>>
>> -total_len += sizeof *shinfo + sizeof(uintptr_t);
>> -total_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr
Re: [ovs-dev] [PATCH v4] dp-packet: Allow DPDK packet resize.
On 2/25/26 12:03 PM, David Marchand via dev wrote:
> By default, DPDK based dp-packets points to data buffers that can't be
> expanded dynamically.
> Their layout is as follows:
> - a minimum 128 bytes headroom chosen at DPDK build time
> (RTE_PKTMBUF_HEADROOM),
> - a maximum size chosen at mempool creation,
>
> In some usecases though (like encapsulating with multiple tunnels),
> a 128 bytes headroom is too short.
>
> Keep on using mono segment packets but dynamically allocate buffers
> in DPDK memory and make use of DPDK external buffers API
> (previously used for userspace TSO).
>
> Signed-off-by: David Marchand
> ---
> Changes since v3:
> - split buffer length calculation in a helper,
> - handled running test without qdisc (net/tap does not require
> those qdiscs, but spews ERR level logs if absent),
> - added check on firewall,
>
> Changes since v2:
> - moved check on uint16_t overflow in netdev_dpdk_extbuf_allocate(),
>
> Changes since v1:
> - fixed new segment length (reset by extbuf attach helper),
> - added a system-dpdk unit test,
>
> ---
> lib/dp-packet.c | 17 +++-
> lib/netdev-dpdk.c| 47
> lib/netdev-dpdk.h| 4 +++
> tests/system-dpdk.at | 65
> 4 files changed, 127 insertions(+), 6 deletions(-)
>
> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> index c04d608be6..4c45636039 100644
> --- a/lib/dp-packet.c
> +++ b/lib/dp-packet.c
> @@ -255,8 +255,23 @@ dp_packet_resize(struct dp_packet *b, size_t
> new_headroom, size_t new_tailroom)
> new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
>
> switch (b->source) {
> -case DPBUF_DPDK:
> +case DPBUF_DPDK: {
> +#ifdef DPDK_NETDEV
> +uint32_t buf_len;
> +
> +buf_len = netdev_dpdk_extbuf_size(new_allocated);
Shouldn't we assign into new_allocated here? The rte_pktmbuf_attach_extbuf()
will update the mbuf->buf_len to the result of this call. However, there
is the dp_packet_set_allocated(b, new_allocated); call that will overwrite
that value with a potentially smaller 'new_allocated'. I'm not sure if that
can cause any issues, since the value is smaller, but it doesn't feel right.
Or am I missing something here?
> +ovs_assert(buf_len <= UINT16_MAX);
> +new_base = netdev_dpdk_extbuf_allocate(buf_len);
> +if (!new_base) {
> +out_of_memory();
> +}
> +dp_packet_copy__(b, new_base, new_headroom, new_tailroom);
> +netdev_dpdk_extbuf_replace(b, new_base, buf_len);
> +break;
> +#else
> OVS_NOT_REACHED();
> +#endif
> +}
>
> case DPBUF_MALLOC:
> if (new_headroom == dp_packet_headroom(b)) {
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 923191da84..cfd641b493 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -3072,12 +3072,51 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk
> *dev, struct rte_mbuf **pkts,
> return cnt;
> }
>
> +uint32_t
> +netdev_dpdk_extbuf_size(uint32_t data_len)
> +{
> +uint32_t buf_len = data_len;
> +
> +buf_len += sizeof(struct rte_mbuf_ext_shared_info) + sizeof(uintptr_t);
> +buf_len = RTE_ALIGN_CEIL(buf_len, sizeof(uintptr_t));
> +
> +return buf_len;
> +}
> +
> +void *
> +netdev_dpdk_extbuf_allocate(uint32_t buf_len)
> +{
> +return rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
> +}
> +
> static void
> netdev_dpdk_extbuf_free(void *addr OVS_UNUSED, void *opaque)
> {
> rte_free(opaque);
> }
>
> +void
> +netdev_dpdk_extbuf_replace(struct dp_packet *b, void *buf, uint32_t data_len)
> +{
> +struct rte_mbuf *pkt = (struct rte_mbuf *) b;
> +struct rte_mbuf_ext_shared_info *shinfo;
> +uint16_t buf_len = data_len;
> +
> +shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len,
> +netdev_dpdk_extbuf_free,
> +buf);
> +ovs_assert(shinfo != NULL);
> +
> +if (RTE_MBUF_HAS_EXTBUF(pkt)) {
> +rte_pktmbuf_detach_extbuf(pkt);
> +}
> +rte_pktmbuf_attach_extbuf(pkt, buf, rte_malloc_virt2iova(buf), buf_len,
> + shinfo);
> +/* OVS only supports mono segment.
> + * Packet size did not change, restore the current segment length. */
> +pkt->data_len = pkt->pkt_len;
> +}
> +
> static struct rte_mbuf *
> dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt, uint32_t data_len)
> {
> @@ -3086,16 +3125,14 @@ dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt,
> uint32_t data_len)
> uint16_t buf_len;
> void *buf;
>
> -total_len += sizeof *shinfo + sizeof(uintptr_t);
> -total_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t));
> -
> +total_len = netdev_dpdk_extbuf_size(total_len);
> if (OVS_UNLIKELY(total_len > UINT16_MAX)) {
> VLOG_ERR("Can't copy packet: too big %u", total_len);
> return NULL;
> }
>
> buf_len
Re: [ovs-dev] [PATCH v4] dp-packet: Allow DPDK packet resize.
On 2/25/26 1:24 PM, David Marchand wrote: > On Wed, 25 Feb 2026 at 12:04, David Marchand via dev > wrote: >> >> By default, DPDK based dp-packets points to data buffers that can't be >> expanded dynamically. >> Their layout is as follows: >> - a minimum 128 bytes headroom chosen at DPDK build time >> (RTE_PKTMBUF_HEADROOM), >> - a maximum size chosen at mempool creation, >> >> In some usecases though (like encapsulating with multiple tunnels), >> a 128 bytes headroom is too short. >> >> Keep on using mono segment packets but dynamically allocate buffers >> in DPDK memory and make use of DPDK external buffers API >> (previously used for userspace TSO). >> >> Signed-off-by: David Marchand > > Here is a strange failure in Cirrus CI (unrelated to the patch afaict). > Any idea what it could be? Some race? These are well-known issues. At this point in time, I assume that it's a bug in FreeBSD's implementation of async io, as I looked at it multiple times and didn't find OVS doing anything non-compliant with POSIX aio API. The log writes sometimes are just re-ordered on large outputs... Best regards, Ilya Maximets. > > 487: ovs-ofctl replace-flows with --bundle FAILED (ovs-ofctl.at:3438) > > ./ovs-ofctl.at:3438: print_vconn_debug | vconn_sub | ofctl_strip > --- - 2026-02-25 11:59:36.400451000 + > +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/487/stdout > 2026-02-25 11:59:36.399629000 + > @@ -39,9 +39,9 @@ > bundle_id=0 type=COMMIT_REPLY flags=0 > vconn|DBG|unix: sent (Success): OFPT_HELLO (OF1.5): > version bitmap: 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 > +vconn|DBG|unix: negotiated OpenFlow version 0x05 (we support version > 0x06 and earlier, peer supports version 0x05) > vconn|DBG|unix: received: OFPT_HELLO (OF1.4): > version bitmap: 0x05 > -vconn|DBG|unix: negotiated OpenFlow version 0x05 (we support version > 0x06 and earlier, peer supports version 0x05) > vconn|DBG|unix: received: OFPST_FLOW request (OF1.4): > vconn|DBG|unix: sent (Success): OFPST_FLOW reply (OF1.4): > table=1, importance=1, dl_vlan=1 actions=drop > > ___ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4] dp-packet: Allow DPDK packet resize.
On Wed, 25 Feb 2026 at 12:04, David Marchand via dev wrote: > > By default, DPDK based dp-packets points to data buffers that can't be > expanded dynamically. > Their layout is as follows: > - a minimum 128 bytes headroom chosen at DPDK build time > (RTE_PKTMBUF_HEADROOM), > - a maximum size chosen at mempool creation, > > In some usecases though (like encapsulating with multiple tunnels), > a 128 bytes headroom is too short. > > Keep on using mono segment packets but dynamically allocate buffers > in DPDK memory and make use of DPDK external buffers API > (previously used for userspace TSO). > > Signed-off-by: David Marchand Here is a strange failure in Cirrus CI (unrelated to the patch afaict). Any idea what it could be? Some race? 487: ovs-ofctl replace-flows with --bundle FAILED (ovs-ofctl.at:3438) ./ovs-ofctl.at:3438: print_vconn_debug | vconn_sub | ofctl_strip --- - 2026-02-25 11:59:36.400451000 + +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/487/stdout 2026-02-25 11:59:36.399629000 + @@ -39,9 +39,9 @@ bundle_id=0 type=COMMIT_REPLY flags=0 vconn|DBG|unix: sent (Success): OFPT_HELLO (OF1.5): version bitmap: 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 +vconn|DBG|unix: negotiated OpenFlow version 0x05 (we support version 0x06 and earlier, peer supports version 0x05) vconn|DBG|unix: received: OFPT_HELLO (OF1.4): version bitmap: 0x05 -vconn|DBG|unix: negotiated OpenFlow version 0x05 (we support version 0x06 and earlier, peer supports version 0x05) vconn|DBG|unix: received: OFPST_FLOW request (OF1.4): vconn|DBG|unix: sent (Success): OFPST_FLOW reply (OF1.4): table=1, importance=1, dl_vlan=1 actions=drop -- David Marchand ___ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
