Re: iommu dma mapping alignment requirements
BTW. I need to know urgently what HW is broken by this Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: Sounds good. Thanks! Note, that these smaller sub-host-page-sized mappings might pollute the address space causing full aligned host-page-size maps to become scarce... Maybe there's a clever way to keep those in their own segment of the address space? We already have a large vs. small split in the iommu virtual space to alleviate this (though it's not a hard constraint, we can still get into the "other" side if the default one is full). Try that patch and let me know: Seems to be working! :) Index: linux-work/arch/powerpc/kernel/iommu.c === --- linux-work.orig/arch/powerpc/kernel/iommu.c 2007-12-21 10:39:39.0 +1100 +++ linux-work/arch/powerpc/kernel/iommu.c 2007-12-21 10:46:18.0 +1100 @@ -278,6 +278,7 @@ int iommu_map_sg(struct iommu_table *tbl unsigned long flags; struct scatterlist *s, *outs, *segstart; int outcount, incount, i; + unsigned int align; unsigned long handle; BUG_ON(direction == DMA_NONE); @@ -309,7 +310,11 @@ int iommu_map_sg(struct iommu_table *tbl /* Allocate iommu entries for that segment */ vaddr = (unsigned long) sg_virt(s); npages = iommu_num_pages(vaddr, slen); - entry = iommu_range_alloc(tbl, npages, , mask >> IOMMU_PAGE_SHIFT, 0); + align = 0; + if (IOMMU_PAGE_SHIFT < PAGE_SHIFT && (vaddr & ~PAGE_MASK) == 0) + align = PAGE_SHIFT - IOMMU_PAGE_SHIFT; + entry = iommu_range_alloc(tbl, npages, , + mask >> IOMMU_PAGE_SHIFT, align); DBG(" - vaddr: %lx, size: %lx\n", vaddr, slen); @@ -572,7 +577,7 @@ dma_addr_t iommu_map_single(struct iommu { dma_addr_t dma_handle = DMA_ERROR_CODE; unsigned long uaddr; - unsigned int npages; + unsigned int npages, align; BUG_ON(direction == DMA_NONE); @@ -580,8 +585,13 @@ dma_addr_t iommu_map_single(struct iommu npages = iommu_num_pages(uaddr, size); if (tbl) { + align = 0; + if (IOMMU_PAGE_SHIFT < PAGE_SHIFT && + ((unsigned long)vaddr & ~PAGE_MASK) == 0) + align = PAGE_SHIFT - IOMMU_PAGE_SHIFT; + dma_handle = iommu_alloc(tbl, vaddr, npages, direction, -mask >> IOMMU_PAGE_SHIFT, 0); +mask >> IOMMU_PAGE_SHIFT, align); if (dma_handle == DMA_ERROR_CODE) { if (printk_ratelimit()) { printk(KERN_INFO "iommu_alloc failed, " -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
> Sounds good. Thanks! > > Note, that these smaller sub-host-page-sized mappings might pollute the > address space causing full aligned host-page-size maps to become > scarce... Maybe there's a clever way to keep those in their own segment > of the address space? We already have a large vs. small split in the iommu virtual space to alleviate this (though it's not a hard constraint, we can still get into the "other" side if the default one is full). Try that patch and let me know: Index: linux-work/arch/powerpc/kernel/iommu.c === --- linux-work.orig/arch/powerpc/kernel/iommu.c 2007-12-21 10:39:39.0 +1100 +++ linux-work/arch/powerpc/kernel/iommu.c 2007-12-21 10:46:18.0 +1100 @@ -278,6 +278,7 @@ int iommu_map_sg(struct iommu_table *tbl unsigned long flags; struct scatterlist *s, *outs, *segstart; int outcount, incount, i; + unsigned int align; unsigned long handle; BUG_ON(direction == DMA_NONE); @@ -309,7 +310,11 @@ int iommu_map_sg(struct iommu_table *tbl /* Allocate iommu entries for that segment */ vaddr = (unsigned long) sg_virt(s); npages = iommu_num_pages(vaddr, slen); - entry = iommu_range_alloc(tbl, npages, , mask >> IOMMU_PAGE_SHIFT, 0); + align = 0; + if (IOMMU_PAGE_SHIFT < PAGE_SHIFT && (vaddr & ~PAGE_MASK) == 0) + align = PAGE_SHIFT - IOMMU_PAGE_SHIFT; + entry = iommu_range_alloc(tbl, npages, , + mask >> IOMMU_PAGE_SHIFT, align); DBG(" - vaddr: %lx, size: %lx\n", vaddr, slen); @@ -572,7 +577,7 @@ dma_addr_t iommu_map_single(struct iommu { dma_addr_t dma_handle = DMA_ERROR_CODE; unsigned long uaddr; - unsigned int npages; + unsigned int npages, align; BUG_ON(direction == DMA_NONE); @@ -580,8 +585,13 @@ dma_addr_t iommu_map_single(struct iommu npages = iommu_num_pages(uaddr, size); if (tbl) { + align = 0; + if (IOMMU_PAGE_SHIFT < PAGE_SHIFT && + ((unsigned long)vaddr & ~PAGE_MASK) == 0) + align = PAGE_SHIFT - IOMMU_PAGE_SHIFT; + dma_handle = iommu_alloc(tbl, vaddr, npages, direction, -mask >> IOMMU_PAGE_SHIFT, 0); +mask >> IOMMU_PAGE_SHIFT, align); if (dma_handle == DMA_ERROR_CODE) { if (printk_ratelimit()) { printk(KERN_INFO "iommu_alloc failed, " -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: On Thu, 2007-12-20 at 15:02 -0600, Steve Wise wrote: Benjamin Herrenschmidt wrote: Adding A few more people to the discussion. You may well be right and we would have to provide the same alignment, though that sucks a bit as one of the reason we switched to 4K for the IOMMU is that the iommu space available on pSeries is very small and we were running out of it with 64K pages and lots of networking activity. But smarter NIC drivers can resolve this too, I think, but perhaps carving up full pages of mapped buffers instead of just assuming mapping is free... True, but the problem still happenens today, if we switch back to 64K iommu page size (which should be possible, I need to fix that), we -will- run out of iommu space on typical workloads and that is not acceptable. So we need to find a compromise. What I might do is something around the lines of: If size >= PAGE_SIZE, and vaddr (page_address + offset) is PAGE_SIZE aligned, then I enforce alignment of the resulting mapping. That should fix your case. Anything requesting smaller than PAGE_SIZE mappings would lose that alignment but I -think- it should be safe, and you still always get 4K alignment anyway (+/- your offset) so at least small alignment restrictions are still enforced (such as cache line alignment etc...). I'll send you a test patch later today. Ben. Sounds good. Thanks! Note, that these smaller sub-host-page-sized mappings might pollute the address space causing full aligned host-page-size maps to become scarce... Maybe there's a clever way to keep those in their own segment of the address space? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
On Thu, 2007-12-20 at 15:02 -0600, Steve Wise wrote: > Benjamin Herrenschmidt wrote: > > Adding A few more people to the discussion. You may well be right and we > > would have to provide the same alignment, though that sucks a bit as one > > of the reason we switched to 4K for the IOMMU is that the iommu space > > available on pSeries is very small and we were running out of it with > > 64K pages and lots of networking activity. > > > > But smarter NIC drivers can resolve this too, I think, but perhaps > carving up full pages of mapped buffers instead of just assuming mapping > is free... True, but the problem still happenens today, if we switch back to 64K iommu page size (which should be possible, I need to fix that), we -will- run out of iommu space on typical workloads and that is not acceptable. So we need to find a compromise. What I might do is something around the lines of: If size >= PAGE_SIZE, and vaddr (page_address + offset) is PAGE_SIZE aligned, then I enforce alignment of the resulting mapping. That should fix your case. Anything requesting smaller than PAGE_SIZE mappings would lose that alignment but I -think- it should be safe, and you still always get 4K alignment anyway (+/- your offset) so at least small alignment restrictions are still enforced (such as cache line alignment etc...). I'll send you a test patch later today. Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: On Thu, 2007-12-20 at 13:29 -0600, Steve Wise wrote: Or based on the alignment of vaddr actually... The later wouldn't be realistic. What I think might be necessay, though it would definitely cause us problems with running out of iommu space (which is the reason we did the switch down to 4K), is to provide alignment to the real page size, and alignement to the allocation order for dma_map_consistent. It might be possible to -tweak- and only provide alignment to the page size for allocations that are larger than IOMMU_PAGE_SIZE. That would solve the problem with small network packets eating up too much iommu space though. What do you think ? That might work. If you gimme a patch, i'll try it out! Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: Adding A few more people to the discussion. You may well be right and we would have to provide the same alignment, though that sucks a bit as one of the reason we switched to 4K for the IOMMU is that the iommu space available on pSeries is very small and we were running out of it with 64K pages and lots of networking activity. But smarter NIC drivers can resolve this too, I think, but perhaps carving up full pages of mapped buffers instead of just assuming mapping is free... perhaps... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
On Thu, 2007-12-20 at 13:29 -0600, Steve Wise wrote: > Or based on the alignment of vaddr actually... The later wouldn't be realistic. What I think might be necessay, though it would definitely cause us problems with running out of iommu space (which is the reason we did the switch down to 4K), is to provide alignment to the real page size, and alignement to the allocation order for dma_map_consistent. It might be possible to -tweak- and only provide alignment to the page size for allocations that are larger than IOMMU_PAGE_SIZE. That would solve the problem with small network packets eating up too much iommu space though. What do you think ? Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
Adding A few more people to the discussion. You may well be right and we would have to provide the same alignment, though that sucks a bit as one of the reason we switched to 4K for the IOMMU is that the iommu space available on pSeries is very small and we were running out of it with 64K pages and lots of networking activity. On Thu, 2007-12-20 at 11:14 -0600, Steve Wise wrote: > Hey Roland (and any iommu/ppc/dma experts out there): > > I'm debugging a data corruption issue that happens on PPC64 systems > running rdma on kernels where the iommu page size is 4KB yet the host > page size is 64KB. This "feature" was added to the PPC64 code recently, > and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB > page size, no problems. If the kernel is prior to 2.6.23 then 64KB page > configs work too. Its just a problem when the iommu page size != host > page size. > > It appears that my problem boils down to a single host page of memory > that is mapped for dma, and the dma address returned by dma_map_sg() is > _not_ 64KB aligned. Here is an example: > > app registers va 0x2d9a3000 len 12288 > ib_umem_get() creates and maps a umem and chunk that looks like (dumping > state from a registered user memory region): > > > umem len 12288 off 12288 pgsz 65536 shift 16 > > chunk 0: nmap 1 nents 1 > > sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr > > 5bff4000 dma_len 65536 > > > > So the kernel maps 1 full page for this MR. But note that the dma > address is 5bff4000 which is 4KB aligned, not 64KB aligned. I > think this is causing grief to the RDMA HW. > > My first question is: Is there an assumption or requirement in linux > that dma_addressess should have the same alignment as the host address > they are mapped to? IE the rdma core is mapping the entire 64KB page, > but the mapping doesn't begin on a 64KB page boundary. > > If this mapping is considered valid, then perhaps the rdma hw is at > fault here. But I'm wondering if this is an PPC/iommu bug. > > BTW: Here is what the Memory Region looks like to the HW: > > > TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 > > perms RW rem_inv_dis 0 addr_type VATO > > bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 > > len 12288 va 2d9a3000 bind_cnt 0 > > PBL: 5bff4000 > > > > Any thoughts? > > Steve. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
Steve Wise wrote: Roland Dreier wrote: > It appears that my problem boils down to a single host page of memory > that is mapped for dma, and the dma address returned by dma_map_sg() > is _not_ 64KB aligned. Here is an example: > My first question is: Is there an assumption or requirement in linux > that dma_addressess should have the same alignment as the host address > they are mapped to? IE the rdma core is mapping the entire 64KB page, > but the mapping doesn't begin on a 64KB page boundary. I don't think this is explicitly documented anywhere, but it certainly seems that we want the bus address to be page-aligned in this case. For mthca/mlx4 at least, we tell the adapter what the host page size is (so that it knows how to align doorbell pages etc) and I think this sort of thing would confuse the HW. - R. In arch/powerpc/kernel/iommu.c:iommu_map_sg() I see that it calls iommu_range_alloc() with a alignment_order of 0: vaddr = (unsigned long)page_address(s->page) + s->offset; npages = iommu_num_pages(vaddr, slen); entry = iommu_range_alloc(tbl, npages, , mask >> IOMMU_PAGE_SHIFT, 0); But perhaps the alignment order needs to be based on the host page size? Or based on the alignment of vaddr actually... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
Roland Dreier wrote: > It appears that my problem boils down to a single host page of memory > that is mapped for dma, and the dma address returned by dma_map_sg() > is _not_ 64KB aligned. Here is an example: > My first question is: Is there an assumption or requirement in linux > that dma_addressess should have the same alignment as the host address > they are mapped to? IE the rdma core is mapping the entire 64KB page, > but the mapping doesn't begin on a 64KB page boundary. I don't think this is explicitly documented anywhere, but it certainly seems that we want the bus address to be page-aligned in this case. For mthca/mlx4 at least, we tell the adapter what the host page size is (so that it knows how to align doorbell pages etc) and I think this sort of thing would confuse the HW. - R. In arch/powerpc/kernel/iommu.c:iommu_map_sg() I see that it calls iommu_range_alloc() with a alignment_order of 0: vaddr = (unsigned long)page_address(s->page) + s->offset; npages = iommu_num_pages(vaddr, slen); entry = iommu_range_alloc(tbl, npages, , mask >> IOMMU_PAGE_SHIFT, 0); But perhaps the alignment order needs to be based on the host page size? Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
> It appears that my problem boils down to a single host page of memory > that is mapped for dma, and the dma address returned by dma_map_sg() > is _not_ 64KB aligned. Here is an example: > My first question is: Is there an assumption or requirement in linux > that dma_addressess should have the same alignment as the host address > they are mapped to? IE the rdma core is mapping the entire 64KB page, > but the mapping doesn't begin on a 64KB page boundary. I don't think this is explicitly documented anywhere, but it certainly seems that we want the bus address to be page-aligned in this case. For mthca/mlx4 at least, we tell the adapter what the host page size is (so that it knows how to align doorbell pages etc) and I think this sort of thing would confuse the HW. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
iommu dma mapping alignment requirements
Hey Roland (and any iommu/ppc/dma experts out there): I'm debugging a data corruption issue that happens on PPC64 systems running rdma on kernels where the iommu page size is 4KB yet the host page size is 64KB. This "feature" was added to the PPC64 code recently, and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB page size, no problems. If the kernel is prior to 2.6.23 then 64KB page configs work too. Its just a problem when the iommu page size != host page size. It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: app registers va 0x2d9a3000 len 12288 ib_umem_get() creates and maps a umem and chunk that looks like (dumping state from a registered user memory region): umem len 12288 off 12288 pgsz 65536 shift 16 chunk 0: nmap 1 nents 1 sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr 5bff4000 dma_len 65536 So the kernel maps 1 full page for this MR. But note that the dma address is 5bff4000 which is 4KB aligned, not 64KB aligned. I think this is causing grief to the RDMA HW. My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. If this mapping is considered valid, then perhaps the rdma hw is at fault here. But I'm wondering if this is an PPC/iommu bug. BTW: Here is what the Memory Region looks like to the HW: TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 perms RW rem_inv_dis 0 addr_type VATO bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 len 12288 va 2d9a3000 bind_cnt 0 PBL: 5bff4000 Any thoughts? Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
On Thu, 2007-12-20 at 11:14 -0600, Steve Wise wrote: > Hey Roland (and any iommu/ppc/dma experts out there): > > I'm debugging a data corruption issue that happens on PPC64 systems > running rdma on kernels where the iommu page size is 4KB yet the host > page size is 64KB. This "feature" was added to the PPC64 code recently, > and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB > page size, no problems. If the kernel is prior to 2.6.23 then 64KB page > configs work too. Its just a problem when the iommu page size != host > page size. > > It appears that my problem boils down to a single host page of memory > that is mapped for dma, and the dma address returned by dma_map_sg() is > _not_ 64KB aligned. Here is an example: > > app registers va 0x2d9a3000 len 12288 > ib_umem_get() creates and maps a umem and chunk that looks like (dumping > state from a registered user memory region): > > > umem len 12288 off 12288 pgsz 65536 shift 16 > > chunk 0: nmap 1 nents 1 > > sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr > > 5bff4000 dma_len 65536 > > > > So the kernel maps 1 full page for this MR. But note that the dma > address is 5bff4000 which is 4KB aligned, not 64KB aligned. I > think this is causing grief to the RDMA HW. > > My first question is: Is there an assumption or requirement in linux > that dma_addressess should have the same alignment as the host address > they are mapped to? IE the rdma core is mapping the entire 64KB page, > but the mapping doesn't begin on a 64KB page boundary. > > If this mapping is considered valid, then perhaps the rdma hw is at > fault here. But I'm wondering if this is an PPC/iommu bug. > > BTW: Here is what the Memory Region looks like to the HW: > > > TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 > > perms RW rem_inv_dis 0 addr_type VATO > > bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 > > len 12288 va 2d9a3000 bind_cnt 0 > > PBL: 5bff4000 > > > > Any thoughts? The Ammasso certainly works this way. If you tell it the page size is 64KB, it will ignore bits in the page address that encode 0-65535. > > Steve. > > > ___ > general mailing list > [EMAIL PROTECTED] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
On Thu, 2007-12-20 at 11:14 -0600, Steve Wise wrote: Hey Roland (and any iommu/ppc/dma experts out there): I'm debugging a data corruption issue that happens on PPC64 systems running rdma on kernels where the iommu page size is 4KB yet the host page size is 64KB. This feature was added to the PPC64 code recently, and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB page size, no problems. If the kernel is prior to 2.6.23 then 64KB page configs work too. Its just a problem when the iommu page size != host page size. It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: app registers va 0x2d9a3000 len 12288 ib_umem_get() creates and maps a umem and chunk that looks like (dumping state from a registered user memory region): umem len 12288 off 12288 pgsz 65536 shift 16 chunk 0: nmap 1 nents 1 sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr 5bff4000 dma_len 65536 So the kernel maps 1 full page for this MR. But note that the dma address is 5bff4000 which is 4KB aligned, not 64KB aligned. I think this is causing grief to the RDMA HW. My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. If this mapping is considered valid, then perhaps the rdma hw is at fault here. But I'm wondering if this is an PPC/iommu bug. BTW: Here is what the Memory Region looks like to the HW: TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 perms RW rem_inv_dis 0 addr_type VATO bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 len 12288 va 2d9a3000 bind_cnt 0 PBL: 5bff4000 Any thoughts? The Ammasso certainly works this way. If you tell it the page size is 64KB, it will ignore bits in the page address that encode 0-65535. Steve. ___ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
iommu dma mapping alignment requirements
Hey Roland (and any iommu/ppc/dma experts out there): I'm debugging a data corruption issue that happens on PPC64 systems running rdma on kernels where the iommu page size is 4KB yet the host page size is 64KB. This feature was added to the PPC64 code recently, and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB page size, no problems. If the kernel is prior to 2.6.23 then 64KB page configs work too. Its just a problem when the iommu page size != host page size. It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: app registers va 0x2d9a3000 len 12288 ib_umem_get() creates and maps a umem and chunk that looks like (dumping state from a registered user memory region): umem len 12288 off 12288 pgsz 65536 shift 16 chunk 0: nmap 1 nents 1 sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr 5bff4000 dma_len 65536 So the kernel maps 1 full page for this MR. But note that the dma address is 5bff4000 which is 4KB aligned, not 64KB aligned. I think this is causing grief to the RDMA HW. My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. If this mapping is considered valid, then perhaps the rdma hw is at fault here. But I'm wondering if this is an PPC/iommu bug. BTW: Here is what the Memory Region looks like to the HW: TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 perms RW rem_inv_dis 0 addr_type VATO bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 len 12288 va 2d9a3000 bind_cnt 0 PBL: 5bff4000 Any thoughts? Steve. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. I don't think this is explicitly documented anywhere, but it certainly seems that we want the bus address to be page-aligned in this case. For mthca/mlx4 at least, we tell the adapter what the host page size is (so that it knows how to align doorbell pages etc) and I think this sort of thing would confuse the HW. - R. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
Roland Dreier wrote: It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. I don't think this is explicitly documented anywhere, but it certainly seems that we want the bus address to be page-aligned in this case. For mthca/mlx4 at least, we tell the adapter what the host page size is (so that it knows how to align doorbell pages etc) and I think this sort of thing would confuse the HW. - R. In arch/powerpc/kernel/iommu.c:iommu_map_sg() I see that it calls iommu_range_alloc() with a alignment_order of 0: vaddr = (unsigned long)page_address(s-page) + s-offset; npages = iommu_num_pages(vaddr, slen); entry = iommu_range_alloc(tbl, npages, handle, mask IOMMU_PAGE_SHIFT, 0); But perhaps the alignment order needs to be based on the host page size? Steve. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
Steve Wise wrote: Roland Dreier wrote: It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. I don't think this is explicitly documented anywhere, but it certainly seems that we want the bus address to be page-aligned in this case. For mthca/mlx4 at least, we tell the adapter what the host page size is (so that it knows how to align doorbell pages etc) and I think this sort of thing would confuse the HW. - R. In arch/powerpc/kernel/iommu.c:iommu_map_sg() I see that it calls iommu_range_alloc() with a alignment_order of 0: vaddr = (unsigned long)page_address(s-page) + s-offset; npages = iommu_num_pages(vaddr, slen); entry = iommu_range_alloc(tbl, npages, handle, mask IOMMU_PAGE_SHIFT, 0); But perhaps the alignment order needs to be based on the host page size? Or based on the alignment of vaddr actually... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
Adding A few more people to the discussion. You may well be right and we would have to provide the same alignment, though that sucks a bit as one of the reason we switched to 4K for the IOMMU is that the iommu space available on pSeries is very small and we were running out of it with 64K pages and lots of networking activity. On Thu, 2007-12-20 at 11:14 -0600, Steve Wise wrote: Hey Roland (and any iommu/ppc/dma experts out there): I'm debugging a data corruption issue that happens on PPC64 systems running rdma on kernels where the iommu page size is 4KB yet the host page size is 64KB. This feature was added to the PPC64 code recently, and is in kernel.org from 2.6.23. So if the kernel is built with a 4KB page size, no problems. If the kernel is prior to 2.6.23 then 64KB page configs work too. Its just a problem when the iommu page size != host page size. It appears that my problem boils down to a single host page of memory that is mapped for dma, and the dma address returned by dma_map_sg() is _not_ 64KB aligned. Here is an example: app registers va 0x2d9a3000 len 12288 ib_umem_get() creates and maps a umem and chunk that looks like (dumping state from a registered user memory region): umem len 12288 off 12288 pgsz 65536 shift 16 chunk 0: nmap 1 nents 1 sglist[0] page 0xc0930b08 off 0 len 65536 dma_addr 5bff4000 dma_len 65536 So the kernel maps 1 full page for this MR. But note that the dma address is 5bff4000 which is 4KB aligned, not 64KB aligned. I think this is causing grief to the RDMA HW. My first question is: Is there an assumption or requirement in linux that dma_addressess should have the same alignment as the host address they are mapped to? IE the rdma core is mapping the entire 64KB page, but the mapping doesn't begin on a 64KB page boundary. If this mapping is considered valid, then perhaps the rdma hw is at fault here. But I'm wondering if this is an PPC/iommu bug. BTW: Here is what the Memory Region looks like to the HW: TPT entry: stag idx 0x2e800 key 0xff state VAL type NSMR pdid 0x2 perms RW rem_inv_dis 0 addr_type VATO bind_enable 1 pg_size 65536 qpid 0x0 pbl_addr 0x003c67c0 len 12288 va 2d9a3000 bind_cnt 0 PBL: 5bff4000 Any thoughts? Steve. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
On Thu, 2007-12-20 at 13:29 -0600, Steve Wise wrote: Or based on the alignment of vaddr actually... The later wouldn't be realistic. What I think might be necessay, though it would definitely cause us problems with running out of iommu space (which is the reason we did the switch down to 4K), is to provide alignment to the real page size, and alignement to the allocation order for dma_map_consistent. It might be possible to -tweak- and only provide alignment to the page size for allocations that are larger than IOMMU_PAGE_SIZE. That would solve the problem with small network packets eating up too much iommu space though. What do you think ? Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: Adding A few more people to the discussion. You may well be right and we would have to provide the same alignment, though that sucks a bit as one of the reason we switched to 4K for the IOMMU is that the iommu space available on pSeries is very small and we were running out of it with 64K pages and lots of networking activity. But smarter NIC drivers can resolve this too, I think, but perhaps carving up full pages of mapped buffers instead of just assuming mapping is free... perhaps... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: On Thu, 2007-12-20 at 13:29 -0600, Steve Wise wrote: Or based on the alignment of vaddr actually... The later wouldn't be realistic. What I think might be necessay, though it would definitely cause us problems with running out of iommu space (which is the reason we did the switch down to 4K), is to provide alignment to the real page size, and alignement to the allocation order for dma_map_consistent. It might be possible to -tweak- and only provide alignment to the page size for allocations that are larger than IOMMU_PAGE_SIZE. That would solve the problem with small network packets eating up too much iommu space though. What do you think ? That might work. If you gimme a patch, i'll try it out! Steve. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
On Thu, 2007-12-20 at 15:02 -0600, Steve Wise wrote: Benjamin Herrenschmidt wrote: Adding A few more people to the discussion. You may well be right and we would have to provide the same alignment, though that sucks a bit as one of the reason we switched to 4K for the IOMMU is that the iommu space available on pSeries is very small and we were running out of it with 64K pages and lots of networking activity. But smarter NIC drivers can resolve this too, I think, but perhaps carving up full pages of mapped buffers instead of just assuming mapping is free... True, but the problem still happenens today, if we switch back to 64K iommu page size (which should be possible, I need to fix that), we -will- run out of iommu space on typical workloads and that is not acceptable. So we need to find a compromise. What I might do is something around the lines of: If size = PAGE_SIZE, and vaddr (page_address + offset) is PAGE_SIZE aligned, then I enforce alignment of the resulting mapping. That should fix your case. Anything requesting smaller than PAGE_SIZE mappings would lose that alignment but I -think- it should be safe, and you still always get 4K alignment anyway (+/- your offset) so at least small alignment restrictions are still enforced (such as cache line alignment etc...). I'll send you a test patch later today. Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: On Thu, 2007-12-20 at 15:02 -0600, Steve Wise wrote: Benjamin Herrenschmidt wrote: Adding A few more people to the discussion. You may well be right and we would have to provide the same alignment, though that sucks a bit as one of the reason we switched to 4K for the IOMMU is that the iommu space available on pSeries is very small and we were running out of it with 64K pages and lots of networking activity. But smarter NIC drivers can resolve this too, I think, but perhaps carving up full pages of mapped buffers instead of just assuming mapping is free... True, but the problem still happenens today, if we switch back to 64K iommu page size (which should be possible, I need to fix that), we -will- run out of iommu space on typical workloads and that is not acceptable. So we need to find a compromise. What I might do is something around the lines of: If size = PAGE_SIZE, and vaddr (page_address + offset) is PAGE_SIZE aligned, then I enforce alignment of the resulting mapping. That should fix your case. Anything requesting smaller than PAGE_SIZE mappings would lose that alignment but I -think- it should be safe, and you still always get 4K alignment anyway (+/- your offset) so at least small alignment restrictions are still enforced (such as cache line alignment etc...). I'll send you a test patch later today. Ben. Sounds good. Thanks! Note, that these smaller sub-host-page-sized mappings might pollute the address space causing full aligned host-page-size maps to become scarce... Maybe there's a clever way to keep those in their own segment of the address space? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
Sounds good. Thanks! Note, that these smaller sub-host-page-sized mappings might pollute the address space causing full aligned host-page-size maps to become scarce... Maybe there's a clever way to keep those in their own segment of the address space? We already have a large vs. small split in the iommu virtual space to alleviate this (though it's not a hard constraint, we can still get into the other side if the default one is full). Try that patch and let me know: Index: linux-work/arch/powerpc/kernel/iommu.c === --- linux-work.orig/arch/powerpc/kernel/iommu.c 2007-12-21 10:39:39.0 +1100 +++ linux-work/arch/powerpc/kernel/iommu.c 2007-12-21 10:46:18.0 +1100 @@ -278,6 +278,7 @@ int iommu_map_sg(struct iommu_table *tbl unsigned long flags; struct scatterlist *s, *outs, *segstart; int outcount, incount, i; + unsigned int align; unsigned long handle; BUG_ON(direction == DMA_NONE); @@ -309,7 +310,11 @@ int iommu_map_sg(struct iommu_table *tbl /* Allocate iommu entries for that segment */ vaddr = (unsigned long) sg_virt(s); npages = iommu_num_pages(vaddr, slen); - entry = iommu_range_alloc(tbl, npages, handle, mask IOMMU_PAGE_SHIFT, 0); + align = 0; + if (IOMMU_PAGE_SHIFT PAGE_SHIFT (vaddr ~PAGE_MASK) == 0) + align = PAGE_SHIFT - IOMMU_PAGE_SHIFT; + entry = iommu_range_alloc(tbl, npages, handle, + mask IOMMU_PAGE_SHIFT, align); DBG( - vaddr: %lx, size: %lx\n, vaddr, slen); @@ -572,7 +577,7 @@ dma_addr_t iommu_map_single(struct iommu { dma_addr_t dma_handle = DMA_ERROR_CODE; unsigned long uaddr; - unsigned int npages; + unsigned int npages, align; BUG_ON(direction == DMA_NONE); @@ -580,8 +585,13 @@ dma_addr_t iommu_map_single(struct iommu npages = iommu_num_pages(uaddr, size); if (tbl) { + align = 0; + if (IOMMU_PAGE_SHIFT PAGE_SHIFT + ((unsigned long)vaddr ~PAGE_MASK) == 0) + align = PAGE_SHIFT - IOMMU_PAGE_SHIFT; + dma_handle = iommu_alloc(tbl, vaddr, npages, direction, -mask IOMMU_PAGE_SHIFT, 0); +mask IOMMU_PAGE_SHIFT, align); if (dma_handle == DMA_ERROR_CODE) { if (printk_ratelimit()) { printk(KERN_INFO iommu_alloc failed, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
Benjamin Herrenschmidt wrote: Sounds good. Thanks! Note, that these smaller sub-host-page-sized mappings might pollute the address space causing full aligned host-page-size maps to become scarce... Maybe there's a clever way to keep those in their own segment of the address space? We already have a large vs. small split in the iommu virtual space to alleviate this (though it's not a hard constraint, we can still get into the other side if the default one is full). Try that patch and let me know: Seems to be working! :) Index: linux-work/arch/powerpc/kernel/iommu.c === --- linux-work.orig/arch/powerpc/kernel/iommu.c 2007-12-21 10:39:39.0 +1100 +++ linux-work/arch/powerpc/kernel/iommu.c 2007-12-21 10:46:18.0 +1100 @@ -278,6 +278,7 @@ int iommu_map_sg(struct iommu_table *tbl unsigned long flags; struct scatterlist *s, *outs, *segstart; int outcount, incount, i; + unsigned int align; unsigned long handle; BUG_ON(direction == DMA_NONE); @@ -309,7 +310,11 @@ int iommu_map_sg(struct iommu_table *tbl /* Allocate iommu entries for that segment */ vaddr = (unsigned long) sg_virt(s); npages = iommu_num_pages(vaddr, slen); - entry = iommu_range_alloc(tbl, npages, handle, mask IOMMU_PAGE_SHIFT, 0); + align = 0; + if (IOMMU_PAGE_SHIFT PAGE_SHIFT (vaddr ~PAGE_MASK) == 0) + align = PAGE_SHIFT - IOMMU_PAGE_SHIFT; + entry = iommu_range_alloc(tbl, npages, handle, + mask IOMMU_PAGE_SHIFT, align); DBG( - vaddr: %lx, size: %lx\n, vaddr, slen); @@ -572,7 +577,7 @@ dma_addr_t iommu_map_single(struct iommu { dma_addr_t dma_handle = DMA_ERROR_CODE; unsigned long uaddr; - unsigned int npages; + unsigned int npages, align; BUG_ON(direction == DMA_NONE); @@ -580,8 +585,13 @@ dma_addr_t iommu_map_single(struct iommu npages = iommu_num_pages(uaddr, size); if (tbl) { + align = 0; + if (IOMMU_PAGE_SHIFT PAGE_SHIFT + ((unsigned long)vaddr ~PAGE_MASK) == 0) + align = PAGE_SHIFT - IOMMU_PAGE_SHIFT; + dma_handle = iommu_alloc(tbl, vaddr, npages, direction, -mask IOMMU_PAGE_SHIFT, 0); +mask IOMMU_PAGE_SHIFT, align); if (dma_handle == DMA_ERROR_CODE) { if (printk_ratelimit()) { printk(KERN_INFO iommu_alloc failed, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iommu dma mapping alignment requirements
BTW. I need to know urgently what HW is broken by this Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/