RE: [PATCH v4] net/af_xdp: fix umem map size for zero copy
> From: Du, Frank [mailto:frank...@intel.com] > Sent: Friday, 24 May 2024 03.05 > > > From: Ferruh Yigit > > Sent: Thursday, May 23, 2024 9:32 PM > > > > On 5/23/2024 10:22 AM, Morten Brørup wrote: > > >> From: Frank Du [mailto:frank...@intel.com] > > >> Sent: Thursday, 23 May 2024 10.08 > > >> > > >> The current calculation assumes that the mbufs are contiguous. > > >> However, this assumption is incorrect when the mbuf memory spans across > > huge page. > > >> To ensure that each mbuf resides exclusively within a single page, > > >> there are deliberate spacing gaps when allocating mbufs across the > > boundaries. > > > > > > A agree that this patch is an improvement of what existed previously. > > > But I still don't understand the patch description. To me, it looks > > > like the patch adds a missing check for contiguous memory, and the > > > patch itself has nothing to do with huge pages. Anyway, if the > > > maintainer agrees with the description, I don't mind not grasping it. > > > ;-) > > > > > > However, while trying to understand what is happening, I think I found one > > more (already existing) bug. > > > I will show through an example inline below. > > > > > >> > > >> Correct to directly read the size from the mempool memory chunk. > > >> > > >> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks") > > >> Cc: sta...@dpdk.org > > >> > > >> Signed-off-by: Frank Du > > >> > > >> --- > > >> v2: > > >> * Add virtual contiguous detect for for multiple memhdrs > > >> v3: > > >> * Use RTE_ALIGN_FLOOR to get the aligned addr > > >> * Add check on the first memhdr of memory chunks > > >> v4: > > >> * Replace the iterating with simple nb_mem_chunks check > > >> --- > > >> drivers/net/af_xdp/rte_eth_af_xdp.c | 33 > > >> +++-- > > >> 1 file changed, 26 insertions(+), 7 deletions(-) > > >> > > >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c > > >> b/drivers/net/af_xdp/rte_eth_af_xdp.c > > >> index 6ba455bb9b..d0431ec089 100644 > > >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c > > >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c > > >> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev > > >> __rte_unused, } > > >> > > >> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG) > > >> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, > > >> uint64_t > > >> *align) > > >> +static inline uintptr_t > > >> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align, > > >> +size_t *len) > > >> { > > >> struct rte_mempool_memhdr *memhdr; > > >> uintptr_t memhdr_addr, aligned_addr; > > >> > > >> +if (mp->nb_mem_chunks != 1) { > > >> +/* > > >> + * The mempool with multiple chunks is not virtual > > >> contiguous > > but > > >> + * xsk umem only support single virtual region mapping. > > >> + */ > > >> +AF_XDP_LOG(ERR, "The mempool contain multiple %u memory > > >> chunks\n", > > >> + mp->nb_mem_chunks); > > >> +return 0; > > >> +} > > >> + > > >> +/* Get the mempool base addr and align from the header now */ > > >> memhdr = STAILQ_FIRST(&mp->mem_list); > > >> +if (!memhdr) { > > >> +AF_XDP_LOG(ERR, "The mempool is not populated\n"); > > >> +return 0; > > >> +} > > >> memhdr_addr = (uintptr_t)memhdr->addr; > > >> -aligned_addr = memhdr_addr & ~(getpagesize() - 1); > > >> +aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize()); > > >> *align = memhdr_addr - aligned_addr; > > >> - > > >> +*len = memhdr->len; > > >> return aligned_addr; > > > > > > On x86_64, the page size is 4 KB = 0x1000. > > > > > > Let's look at an example where memhdr->addr is not aligned to the page > size: > > > > > > In the example, > > > memhdr->addr is 0x700100, and > > > memhdr->len is 0x2. > > > > > > Then > > > aligned_addr becomes 0x70, > > > *align becomes 0x100, and > > > *len becomes 0x2. > > > > > >> } > > >> > > >> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct > > >> pmd_internals *internals, > > >> void *base_addr = NULL; > > >> struct rte_mempool *mb_pool = rxq->mb_pool; > > >> uint64_t umem_size, align = 0; > > >> +size_t len = 0; > > >> > > >> if (internals->shared_umem) { > > >> if (get_shared_umem(rxq, internals->if_name, &umem) < > > >> 0) @@ > > >> -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct > > >> pmd_internals *internals, > > >> } > > >> > > >> umem->mb_pool = mb_pool; > > >> -base_addr = (void *)get_base_addr(mb_pool, &align); > > >> -umem_size = (uint64_t)mb_pool->populated_size * > > >> -(uint64_t)usr_config.frame_size + > > >> -align; > > >> +
RE: [PATCH v4] net/af_xdp: fix umem map size for zero copy
> -Original Message- > From: Ferruh Yigit > Sent: Thursday, May 23, 2024 9:32 PM > To: Morten Brørup ; Du, Frank > ; dev@dpdk.org > Cc: Loftus, Ciara > Subject: Re: [PATCH v4] net/af_xdp: fix umem map size for zero copy > > On 5/23/2024 10:22 AM, Morten Brørup wrote: > >> From: Frank Du [mailto:frank...@intel.com] > >> Sent: Thursday, 23 May 2024 10.08 > >> > >> The current calculation assumes that the mbufs are contiguous. > >> However, this assumption is incorrect when the mbuf memory spans across > huge page. > >> To ensure that each mbuf resides exclusively within a single page, > >> there are deliberate spacing gaps when allocating mbufs across the > boundaries. > > > > A agree that this patch is an improvement of what existed previously. > > But I still don't understand the patch description. To me, it looks > > like the patch adds a missing check for contiguous memory, and the > > patch itself has nothing to do with huge pages. Anyway, if the > > maintainer agrees with the description, I don't mind not grasping it. > > ;-) > > > > However, while trying to understand what is happening, I think I found one > more (already existing) bug. > > I will show through an example inline below. > > > >> > >> Correct to directly read the size from the mempool memory chunk. > >> > >> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks") > >> Cc: sta...@dpdk.org > >> > >> Signed-off-by: Frank Du > >> > >> --- > >> v2: > >> * Add virtual contiguous detect for for multiple memhdrs > >> v3: > >> * Use RTE_ALIGN_FLOOR to get the aligned addr > >> * Add check on the first memhdr of memory chunks > >> v4: > >> * Replace the iterating with simple nb_mem_chunks check > >> --- > >> drivers/net/af_xdp/rte_eth_af_xdp.c | 33 > >> +++-- > >> 1 file changed, 26 insertions(+), 7 deletions(-) > >> > >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c > >> b/drivers/net/af_xdp/rte_eth_af_xdp.c > >> index 6ba455bb9b..d0431ec089 100644 > >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c > >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c > >> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev > >> __rte_unused, } > >> > >> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG) > >> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, > >> uint64_t > >> *align) > >> +static inline uintptr_t > >> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align, > >> +size_t *len) > >> { > >>struct rte_mempool_memhdr *memhdr; > >>uintptr_t memhdr_addr, aligned_addr; > >> > >> + if (mp->nb_mem_chunks != 1) { > >> + /* > >> + * The mempool with multiple chunks is not virtual contiguous > but > >> + * xsk umem only support single virtual region mapping. > >> + */ > >> + AF_XDP_LOG(ERR, "The mempool contain multiple %u memory > >> chunks\n", > >> + mp->nb_mem_chunks); > >> + return 0; > >> + } > >> + > >> + /* Get the mempool base addr and align from the header now */ > >>memhdr = STAILQ_FIRST(&mp->mem_list); > >> + if (!memhdr) { > >> + AF_XDP_LOG(ERR, "The mempool is not populated\n"); > >> + return 0; > >> + } > >>memhdr_addr = (uintptr_t)memhdr->addr; > >> - aligned_addr = memhdr_addr & ~(getpagesize() - 1); > >> + aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize()); > >>*align = memhdr_addr - aligned_addr; > >> - > >> + *len = memhdr->len; > >>return aligned_addr; > > > > On x86_64, the page size is 4 KB = 0x1000. > > > > Let's look at an example where memhdr->addr is not aligned to the page size: > > > > In the example, > > memhdr->addr is 0x700100, and > > memhdr->len is 0x2. > > > > Then > > aligned_addr becomes 0x70, > > *align becomes 0x100, and > > *len becomes 0x2. > > > >> } > >> > >> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct > >> pmd_internals *internals, > >>void *base_addr = NULL; > >>struct rte_mempool *mb_pool = rxq->mb_pool; > >>ui
Re: [PATCH v4] net/af_xdp: fix umem map size for zero copy
On 5/23/2024 10:22 AM, Morten Brørup wrote: >> From: Frank Du [mailto:frank...@intel.com] >> Sent: Thursday, 23 May 2024 10.08 >> >> The current calculation assumes that the mbufs are contiguous. However, >> this assumption is incorrect when the mbuf memory spans across huge page. >> To ensure that each mbuf resides exclusively within a single page, there >> are deliberate spacing gaps when allocating mbufs across the boundaries. > > A agree that this patch is an improvement of what existed previously. > But I still don't understand the patch description. To me, it looks like the > patch adds a missing check for contiguous memory, and the patch itself has > nothing to do with huge pages. Anyway, if the maintainer agrees with the > description, I don't mind not grasping it. ;-) > > However, while trying to understand what is happening, I think I found one > more (already existing) bug. > I will show through an example inline below. > >> >> Correct to directly read the size from the mempool memory chunk. >> >> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Frank Du >> >> --- >> v2: >> * Add virtual contiguous detect for for multiple memhdrs >> v3: >> * Use RTE_ALIGN_FLOOR to get the aligned addr >> * Add check on the first memhdr of memory chunks >> v4: >> * Replace the iterating with simple nb_mem_chunks check >> --- >> drivers/net/af_xdp/rte_eth_af_xdp.c | 33 +++-- >> 1 file changed, 26 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c >> b/drivers/net/af_xdp/rte_eth_af_xdp.c >> index 6ba455bb9b..d0431ec089 100644 >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c >> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused, >> } >> >> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG) >> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t >> *align) >> +static inline uintptr_t >> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align, size_t *len) >> { >> struct rte_mempool_memhdr *memhdr; >> uintptr_t memhdr_addr, aligned_addr; >> >> +if (mp->nb_mem_chunks != 1) { >> +/* >> + * The mempool with multiple chunks is not virtual contiguous >> but >> + * xsk umem only support single virtual region mapping. >> + */ >> +AF_XDP_LOG(ERR, "The mempool contain multiple %u memory >> chunks\n", >> + mp->nb_mem_chunks); >> +return 0; >> +} >> + >> +/* Get the mempool base addr and align from the header now */ >> memhdr = STAILQ_FIRST(&mp->mem_list); >> +if (!memhdr) { >> +AF_XDP_LOG(ERR, "The mempool is not populated\n"); >> +return 0; >> +} >> memhdr_addr = (uintptr_t)memhdr->addr; >> -aligned_addr = memhdr_addr & ~(getpagesize() - 1); >> +aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize()); >> *align = memhdr_addr - aligned_addr; >> - >> +*len = memhdr->len; >> return aligned_addr; > > On x86_64, the page size is 4 KB = 0x1000. > > Let's look at an example where memhdr->addr is not aligned to the page size: > > In the example, > memhdr->addr is 0x700100, and > memhdr->len is 0x2. > > Then > aligned_addr becomes 0x70, > *align becomes 0x100, and > *len becomes 0x2. > >> } >> >> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals >> *internals, >> void *base_addr = NULL; >> struct rte_mempool *mb_pool = rxq->mb_pool; >> uint64_t umem_size, align = 0; >> +size_t len = 0; >> >> if (internals->shared_umem) { >> if (get_shared_umem(rxq, internals->if_name, &umem) < 0) >> @@ -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct >> pmd_internals >> *internals, >> } >> >> umem->mb_pool = mb_pool; >> -base_addr = (void *)get_base_addr(mb_pool, &align); >> -umem_size = (uint64_t)mb_pool->populated_size * >> -(uint64_t)usr_config.frame_size + >> -align; >> +base_addr = (void *)get_memhdr_info(mb_pool, &align, &len); >> +if (!base_addr) { >> +AF_XDP_LOG(ERR, "The memory pool can't be mapped as >> umem\n"); >> +goto err; >> +} >> +umem_size = (uint64_t)len + align; > > Here, umem_size becomes 0x20100. > >> >> ret = xsk_umem__create(&umem->umem, base_addr, umem_size, >> &rxq->fq, &rxq->cq, &usr_config); > > Here, xsk_umem__create() is called with the base_address (0x70) preceding > the address of the memory chunk (0x700100). > It looks like a bug, causing a buffer underrun. I.e. will it access memory > starting at base_address? > I already asked for this on v2, Frank mentioned that area is not access
RE: [PATCH v4] net/af_xdp: fix umem map size for zero copy
> From: Frank Du [mailto:frank...@intel.com] > Sent: Thursday, 23 May 2024 10.08 > > The current calculation assumes that the mbufs are contiguous. However, > this assumption is incorrect when the mbuf memory spans across huge page. > To ensure that each mbuf resides exclusively within a single page, there > are deliberate spacing gaps when allocating mbufs across the boundaries. A agree that this patch is an improvement of what existed previously. But I still don't understand the patch description. To me, it looks like the patch adds a missing check for contiguous memory, and the patch itself has nothing to do with huge pages. Anyway, if the maintainer agrees with the description, I don't mind not grasping it. ;-) However, while trying to understand what is happening, I think I found one more (already existing) bug. I will show through an example inline below. > > Correct to directly read the size from the mempool memory chunk. > > Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks") > Cc: sta...@dpdk.org > > Signed-off-by: Frank Du > > --- > v2: > * Add virtual contiguous detect for for multiple memhdrs > v3: > * Use RTE_ALIGN_FLOOR to get the aligned addr > * Add check on the first memhdr of memory chunks > v4: > * Replace the iterating with simple nb_mem_chunks check > --- > drivers/net/af_xdp/rte_eth_af_xdp.c | 33 +++-- > 1 file changed, 26 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c > b/drivers/net/af_xdp/rte_eth_af_xdp.c > index 6ba455bb9b..d0431ec089 100644 > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c > @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused, > } > > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG) > -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t > *align) > +static inline uintptr_t > +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align, size_t *len) > { > struct rte_mempool_memhdr *memhdr; > uintptr_t memhdr_addr, aligned_addr; > > + if (mp->nb_mem_chunks != 1) { > + /* > + * The mempool with multiple chunks is not virtual contiguous > but > + * xsk umem only support single virtual region mapping. > + */ > + AF_XDP_LOG(ERR, "The mempool contain multiple %u memory > chunks\n", > +mp->nb_mem_chunks); > + return 0; > + } > + > + /* Get the mempool base addr and align from the header now */ > memhdr = STAILQ_FIRST(&mp->mem_list); > + if (!memhdr) { > + AF_XDP_LOG(ERR, "The mempool is not populated\n"); > + return 0; > + } > memhdr_addr = (uintptr_t)memhdr->addr; > - aligned_addr = memhdr_addr & ~(getpagesize() - 1); > + aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize()); > *align = memhdr_addr - aligned_addr; > - > + *len = memhdr->len; > return aligned_addr; On x86_64, the page size is 4 KB = 0x1000. Let's look at an example where memhdr->addr is not aligned to the page size: In the example, memhdr->addr is 0x700100, and memhdr->len is 0x2. Then aligned_addr becomes 0x70, *align becomes 0x100, and *len becomes 0x2. > } > > @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals > *internals, > void *base_addr = NULL; > struct rte_mempool *mb_pool = rxq->mb_pool; > uint64_t umem_size, align = 0; > + size_t len = 0; > > if (internals->shared_umem) { > if (get_shared_umem(rxq, internals->if_name, &umem) < 0) > @@ -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals > *internals, > } > > umem->mb_pool = mb_pool; > - base_addr = (void *)get_base_addr(mb_pool, &align); > - umem_size = (uint64_t)mb_pool->populated_size * > - (uint64_t)usr_config.frame_size + > - align; > + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len); > + if (!base_addr) { > + AF_XDP_LOG(ERR, "The memory pool can't be mapped as > umem\n"); > + goto err; > + } > + umem_size = (uint64_t)len + align; Here, umem_size becomes 0x20100. > > ret = xsk_umem__create(&umem->umem, base_addr, umem_size, > &rxq->fq, &rxq->cq, &usr_config); Here, xsk_umem__create() is called with the base_address (0x70) preceding the address of the memory chunk (0x700100). It looks like a bug, causing a buffer underrun. I.e. will it access memory starting at base_address? If I'm correct, the code should probably do this for alignment instead: aligned_addr = RTE_ALIGN_CEIL(memhdr_addr, getpagesize()); *align = aligned_addr - memhdr_addr; umem_size = (uint64_t)len - align; Disclaimer: I don't know m