RE: [PATCH v4] net/af_xdp: fix umem map size for zero copy

2024-05-23 Thread Morten Brørup
> From: Du, Frank [mailto:frank...@intel.com]
> Sent: Friday, 24 May 2024 03.05
> 
> > From: Ferruh Yigit 
> > Sent: Thursday, May 23, 2024 9:32 PM
> >
> > On 5/23/2024 10:22 AM, Morten Brørup wrote:
> > >> From: Frank Du [mailto:frank...@intel.com]
> > >> Sent: Thursday, 23 May 2024 10.08
> > >>
> > >> The current calculation assumes that the mbufs are contiguous.
> > >> However, this assumption is incorrect when the mbuf memory spans across
> > huge page.
> > >> To ensure that each mbuf resides exclusively within a single page,
> > >> there are deliberate spacing gaps when allocating mbufs across the
> > boundaries.
> > >
> > > A agree that this patch is an improvement of what existed previously.
> > > But I still don't understand the patch description. To me, it looks
> > > like the patch adds a missing check for contiguous memory, and the
> > > patch itself has nothing to do with huge pages. Anyway, if the
> > > maintainer agrees with the description, I don't mind not grasping it.
> > > ;-)
> > >
> > > However, while trying to understand what is happening, I think I found one
> > more (already existing) bug.
> > > I will show through an example inline below.
> > >
> > >>
> > >> Correct to directly read the size from the mempool memory chunk.
> > >>
> > >> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> > >> Cc: sta...@dpdk.org
> > >>
> > >> Signed-off-by: Frank Du 
> > >>
> > >> ---
> > >> v2:
> > >> * Add virtual contiguous detect for for multiple memhdrs
> > >> v3:
> > >> * Use RTE_ALIGN_FLOOR to get the aligned addr
> > >> * Add check on the first memhdr of memory chunks
> > >> v4:
> > >> * Replace the iterating with simple nb_mem_chunks check
> > >> ---
> > >>  drivers/net/af_xdp/rte_eth_af_xdp.c | 33
> > >> +++--
> > >>  1 file changed, 26 insertions(+), 7 deletions(-)
> > >>
> > >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> index 6ba455bb9b..d0431ec089 100644
> > >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev
> > >> __rte_unused,  }
> > >>
> > >>  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > >> -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > >> uint64_t
> > >> *align)
> > >> +static inline uintptr_t
> > >> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align,
> > >> +size_t *len)
> > >>  {
> > >>  struct rte_mempool_memhdr *memhdr;
> > >>  uintptr_t memhdr_addr, aligned_addr;
> > >>
> > >> +if (mp->nb_mem_chunks != 1) {
> > >> +/*
> > >> + * The mempool with multiple chunks is not virtual 
> > >> contiguous
> > but
> > >> + * xsk umem only support single virtual region mapping.
> > >> + */
> > >> +AF_XDP_LOG(ERR, "The mempool contain multiple %u memory
> > >> chunks\n",
> > >> +   mp->nb_mem_chunks);
> > >> +return 0;
> > >> +}
> > >> +
> > >> +/* Get the mempool base addr and align from the header now */
> > >>  memhdr = STAILQ_FIRST(&mp->mem_list);
> > >> +if (!memhdr) {
> > >> +AF_XDP_LOG(ERR, "The mempool is not populated\n");
> > >> +return 0;
> > >> +}
> > >>  memhdr_addr = (uintptr_t)memhdr->addr;
> > >> -aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > >> +aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
> > >>  *align = memhdr_addr - aligned_addr;
> > >> -
> > >> +*len = memhdr->len;
> > >>  return aligned_addr;
> > >
> > > On x86_64, the page size is 4 KB = 0x1000.
> > >
> > > Let's look at an example where memhdr->addr is not aligned to the page
> size:
> > >
> > > In the example,
> > > memhdr->addr is 0x700100, and
> > > memhdr->len is 0x2.
> > >
> > > Then
> > > aligned_addr becomes 0x70,
> > > *align becomes 0x100, and
> > > *len becomes 0x2.
> > >
> > >>  }
> > >>
> > >> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct
> > >> pmd_internals *internals,
> > >>  void *base_addr = NULL;
> > >>  struct rte_mempool *mb_pool = rxq->mb_pool;
> > >>  uint64_t umem_size, align = 0;
> > >> +size_t len = 0;
> > >>
> > >>  if (internals->shared_umem) {
> > >>  if (get_shared_umem(rxq, internals->if_name, &umem) < 
> > >> 0) @@
> > >> -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct
> > >> pmd_internals *internals,
> > >>  }
> > >>
> > >>  umem->mb_pool = mb_pool;
> > >> -base_addr = (void *)get_base_addr(mb_pool, &align);
> > >> -umem_size = (uint64_t)mb_pool->populated_size *
> > >> -(uint64_t)usr_config.frame_size +
> > >> -align;
> > >> + 

RE: [PATCH v4] net/af_xdp: fix umem map size for zero copy

2024-05-23 Thread Du, Frank
> -Original Message-
> From: Ferruh Yigit 
> Sent: Thursday, May 23, 2024 9:32 PM
> To: Morten Brørup ; Du, Frank
> ; dev@dpdk.org
> Cc: Loftus, Ciara 
> Subject: Re: [PATCH v4] net/af_xdp: fix umem map size for zero copy
> 
> On 5/23/2024 10:22 AM, Morten Brørup wrote:
> >> From: Frank Du [mailto:frank...@intel.com]
> >> Sent: Thursday, 23 May 2024 10.08
> >>
> >> The current calculation assumes that the mbufs are contiguous.
> >> However, this assumption is incorrect when the mbuf memory spans across
> huge page.
> >> To ensure that each mbuf resides exclusively within a single page,
> >> there are deliberate spacing gaps when allocating mbufs across the
> boundaries.
> >
> > A agree that this patch is an improvement of what existed previously.
> > But I still don't understand the patch description. To me, it looks
> > like the patch adds a missing check for contiguous memory, and the
> > patch itself has nothing to do with huge pages. Anyway, if the
> > maintainer agrees with the description, I don't mind not grasping it.
> > ;-)
> >
> > However, while trying to understand what is happening, I think I found one
> more (already existing) bug.
> > I will show through an example inline below.
> >
> >>
> >> Correct to directly read the size from the mempool memory chunk.
> >>
> >> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> >> Cc: sta...@dpdk.org
> >>
> >> Signed-off-by: Frank Du 
> >>
> >> ---
> >> v2:
> >> * Add virtual contiguous detect for for multiple memhdrs
> >> v3:
> >> * Use RTE_ALIGN_FLOOR to get the aligned addr
> >> * Add check on the first memhdr of memory chunks
> >> v4:
> >> * Replace the iterating with simple nb_mem_chunks check
> >> ---
> >>  drivers/net/af_xdp/rte_eth_af_xdp.c | 33
> >> +++--
> >>  1 file changed, 26 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> index 6ba455bb9b..d0431ec089 100644
> >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev
> >> __rte_unused,  }
> >>
> >>  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> >> -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> >> uint64_t
> >> *align)
> >> +static inline uintptr_t
> >> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align,
> >> +size_t *len)
> >>  {
> >>struct rte_mempool_memhdr *memhdr;
> >>uintptr_t memhdr_addr, aligned_addr;
> >>
> >> +  if (mp->nb_mem_chunks != 1) {
> >> +  /*
> >> +   * The mempool with multiple chunks is not virtual contiguous
> but
> >> +   * xsk umem only support single virtual region mapping.
> >> +   */
> >> +  AF_XDP_LOG(ERR, "The mempool contain multiple %u memory
> >> chunks\n",
> >> + mp->nb_mem_chunks);
> >> +  return 0;
> >> +  }
> >> +
> >> +  /* Get the mempool base addr and align from the header now */
> >>memhdr = STAILQ_FIRST(&mp->mem_list);
> >> +  if (!memhdr) {
> >> +  AF_XDP_LOG(ERR, "The mempool is not populated\n");
> >> +  return 0;
> >> +  }
> >>memhdr_addr = (uintptr_t)memhdr->addr;
> >> -  aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> >> +  aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
> >>*align = memhdr_addr - aligned_addr;
> >> -
> >> +  *len = memhdr->len;
> >>return aligned_addr;
> >
> > On x86_64, the page size is 4 KB = 0x1000.
> >
> > Let's look at an example where memhdr->addr is not aligned to the page size:
> >
> > In the example,
> > memhdr->addr is 0x700100, and
> > memhdr->len is 0x2.
> >
> > Then
> > aligned_addr becomes 0x70,
> > *align becomes 0x100, and
> > *len becomes 0x2.
> >
> >>  }
> >>
> >> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct
> >> pmd_internals *internals,
> >>void *base_addr = NULL;
> >>struct rte_mempool *mb_pool = rxq->mb_pool;
> >>ui

Re: [PATCH v4] net/af_xdp: fix umem map size for zero copy

2024-05-23 Thread Ferruh Yigit
On 5/23/2024 10:22 AM, Morten Brørup wrote:
>> From: Frank Du [mailto:frank...@intel.com]
>> Sent: Thursday, 23 May 2024 10.08
>>
>> The current calculation assumes that the mbufs are contiguous. However,
>> this assumption is incorrect when the mbuf memory spans across huge page.
>> To ensure that each mbuf resides exclusively within a single page, there
>> are deliberate spacing gaps when allocating mbufs across the boundaries.
> 
> A agree that this patch is an improvement of what existed previously.
> But I still don't understand the patch description. To me, it looks like the 
> patch adds a missing check for contiguous memory, and the patch itself has 
> nothing to do with huge pages. Anyway, if the maintainer agrees with the 
> description, I don't mind not grasping it. ;-)
> 
> However, while trying to understand what is happening, I think I found one 
> more (already existing) bug.
> I will show through an example inline below.
> 
>>
>> Correct to directly read the size from the mempool memory chunk.
>>
>> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
>> Cc: sta...@dpdk.org
>>
>> Signed-off-by: Frank Du 
>>
>> ---
>> v2:
>> * Add virtual contiguous detect for for multiple memhdrs
>> v3:
>> * Use RTE_ALIGN_FLOOR to get the aligned addr
>> * Add check on the first memhdr of memory chunks
>> v4:
>> * Replace the iterating with simple nb_mem_chunks check
>> ---
>>  drivers/net/af_xdp/rte_eth_af_xdp.c | 33 +++--
>>  1 file changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> index 6ba455bb9b..d0431ec089 100644
>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
>>  }
>>
>>  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
>> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
>> *align)
>> +static inline uintptr_t
>> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align, size_t *len)
>>  {
>>  struct rte_mempool_memhdr *memhdr;
>>  uintptr_t memhdr_addr, aligned_addr;
>>
>> +if (mp->nb_mem_chunks != 1) {
>> +/*
>> + * The mempool with multiple chunks is not virtual contiguous 
>> but
>> + * xsk umem only support single virtual region mapping.
>> + */
>> +AF_XDP_LOG(ERR, "The mempool contain multiple %u memory
>> chunks\n",
>> +   mp->nb_mem_chunks);
>> +return 0;
>> +}
>> +
>> +/* Get the mempool base addr and align from the header now */
>>  memhdr = STAILQ_FIRST(&mp->mem_list);
>> +if (!memhdr) {
>> +AF_XDP_LOG(ERR, "The mempool is not populated\n");
>> +return 0;
>> +}
>>  memhdr_addr = (uintptr_t)memhdr->addr;
>> -aligned_addr = memhdr_addr & ~(getpagesize() - 1);
>> +aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
>>  *align = memhdr_addr - aligned_addr;
>> -
>> +*len = memhdr->len;
>>  return aligned_addr;
> 
> On x86_64, the page size is 4 KB = 0x1000.
> 
> Let's look at an example where memhdr->addr is not aligned to the page size:
> 
> In the example,
> memhdr->addr is 0x700100, and
> memhdr->len is 0x2.
> 
> Then
> aligned_addr becomes 0x70,
> *align becomes 0x100, and
> *len becomes 0x2.
> 
>>  }
>>
>> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
>> *internals,
>>  void *base_addr = NULL;
>>  struct rte_mempool *mb_pool = rxq->mb_pool;
>>  uint64_t umem_size, align = 0;
>> +size_t len = 0;
>>
>>  if (internals->shared_umem) {
>>  if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
>> @@ -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct 
>> pmd_internals
>> *internals,
>>  }
>>
>>  umem->mb_pool = mb_pool;
>> -base_addr = (void *)get_base_addr(mb_pool, &align);
>> -umem_size = (uint64_t)mb_pool->populated_size *
>> -(uint64_t)usr_config.frame_size +
>> -align;
>> +base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
>> +if (!base_addr) {
>> +AF_XDP_LOG(ERR, "The memory pool can't be mapped as
>> umem\n");
>> +goto err;
>> +}
>> +umem_size = (uint64_t)len + align;
> 
> Here, umem_size becomes 0x20100.
> 
>>
>>  ret = xsk_umem__create(&umem->umem, base_addr, umem_size,
>>  &rxq->fq, &rxq->cq, &usr_config);
> 
> Here, xsk_umem__create() is called with the base_address (0x70) preceding 
> the address of the memory chunk (0x700100).
> It looks like a bug, causing a buffer underrun. I.e. will it access memory 
> starting at base_address?
> 

I already asked for this on v2, Frank mentioned that area is not
access

RE: [PATCH v4] net/af_xdp: fix umem map size for zero copy

2024-05-23 Thread Morten Brørup
> From: Frank Du [mailto:frank...@intel.com]
> Sent: Thursday, 23 May 2024 10.08
> 
> The current calculation assumes that the mbufs are contiguous. However,
> this assumption is incorrect when the mbuf memory spans across huge page.
> To ensure that each mbuf resides exclusively within a single page, there
> are deliberate spacing gaps when allocating mbufs across the boundaries.

A agree that this patch is an improvement of what existed previously.
But I still don't understand the patch description. To me, it looks like the 
patch adds a missing check for contiguous memory, and the patch itself has 
nothing to do with huge pages. Anyway, if the maintainer agrees with the 
description, I don't mind not grasping it. ;-)

However, while trying to understand what is happening, I think I found one more 
(already existing) bug.
I will show through an example inline below.

> 
> Correct to directly read the size from the mempool memory chunk.
> 
> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Frank Du 
> 
> ---
> v2:
> * Add virtual contiguous detect for for multiple memhdrs
> v3:
> * Use RTE_ALIGN_FLOOR to get the aligned addr
> * Add check on the first memhdr of memory chunks
> v4:
> * Replace the iterating with simple nb_mem_chunks check
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 33 +++--
>  1 file changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 6ba455bb9b..d0431ec089 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
>  }
> 
>  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
> *align)
> +static inline uintptr_t
> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align, size_t *len)
>  {
>   struct rte_mempool_memhdr *memhdr;
>   uintptr_t memhdr_addr, aligned_addr;
> 
> + if (mp->nb_mem_chunks != 1) {
> + /*
> +  * The mempool with multiple chunks is not virtual contiguous 
> but
> +  * xsk umem only support single virtual region mapping.
> +  */
> + AF_XDP_LOG(ERR, "The mempool contain multiple %u memory
> chunks\n",
> +mp->nb_mem_chunks);
> + return 0;
> + }
> +
> + /* Get the mempool base addr and align from the header now */
>   memhdr = STAILQ_FIRST(&mp->mem_list);
> + if (!memhdr) {
> + AF_XDP_LOG(ERR, "The mempool is not populated\n");
> + return 0;
> + }
>   memhdr_addr = (uintptr_t)memhdr->addr;
> - aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> + aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
>   *align = memhdr_addr - aligned_addr;
> -
> + *len = memhdr->len;
>   return aligned_addr;

On x86_64, the page size is 4 KB = 0x1000.

Let's look at an example where memhdr->addr is not aligned to the page size:

In the example,
memhdr->addr is 0x700100, and
memhdr->len is 0x2.

Then
aligned_addr becomes 0x70,
*align becomes 0x100, and
*len becomes 0x2.

>  }
> 
> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
> *internals,
>   void *base_addr = NULL;
>   struct rte_mempool *mb_pool = rxq->mb_pool;
>   uint64_t umem_size, align = 0;
> + size_t len = 0;
> 
>   if (internals->shared_umem) {
>   if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
> @@ -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
> *internals,
>   }
> 
>   umem->mb_pool = mb_pool;
> - base_addr = (void *)get_base_addr(mb_pool, &align);
> - umem_size = (uint64_t)mb_pool->populated_size *
> - (uint64_t)usr_config.frame_size +
> - align;
> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
> + if (!base_addr) {
> + AF_XDP_LOG(ERR, "The memory pool can't be mapped as
> umem\n");
> + goto err;
> + }
> + umem_size = (uint64_t)len + align;

Here, umem_size becomes 0x20100.

> 
>   ret = xsk_umem__create(&umem->umem, base_addr, umem_size,
>   &rxq->fq, &rxq->cq, &usr_config);

Here, xsk_umem__create() is called with the base_address (0x70) preceding 
the address of the memory chunk (0x700100).
It looks like a bug, causing a buffer underrun. I.e. will it access memory 
starting at base_address?

If I'm correct, the code should probably do this for alignment instead:

aligned_addr = RTE_ALIGN_CEIL(memhdr_addr, getpagesize());
*align = aligned_addr - memhdr_addr;
umem_size = (uint64_t)len - align;


Disclaimer: I don't know m