Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
On Wed, Jul 18, 2012 at 2:10 PM, Chris Metcalf wrote: > On 7/18/2012 12:50 PM, Bjorn Helgaas wrote: >> On Wed, Jul 18, 2012 at 10:15 AM, Chris Metcalf wrote: >>> On 7/13/2012 1:25 PM, Bjorn Helgaas wrote: On Fri, Jul 13, 2012 at 11:52:11AM -0400, Chris Metcalf wrote: > We use the same pci_iomem_resource for different domains or host > bridges, but the MMIO apertures for each bridge do not overlap because > non-overlapping resource ranges are allocated for each domains. You should not use the same pci_iomem_resource for different host bridges because that tells the PCI core that everything in pci_iomem_resource is available for devices under every host bridge, which I doubt is the case. The fact that your firmware assigns non-overlapping resources is good and works now, but if the kernel ever needs to allocate resources itself, >>> Actually, we were not using any firmware. It was indeed the kernel which >>> allocates resources from the shared pci_iomem_resource. >> Wow. I wonder how that managed to work. Is there some information >> that would have helped the PCI core do the right allocations? Or >> maybe the host bridges forward everything they receive to PCI, >> regardless of address, and any given MMIO address is only routed to >> one of the host bridges because of the routing info in the page >> tables? > > Since each host bridge contains non-overlapping ranges in its bridge config > header, ioremap() locates the right host bridge for the target PCI resource > address and programs the host bridge info into the MMIO mapping. The end > result is the MMIO address is routed to the right host bridge. On Tile > processors, different host bridges are like separate IO devices, in > completely separate domains. > >> I guess in that case, the "apertures" would basically be >> defined by the page tables, not by the host bridges. But that still >> doesn't explain how we would assign non-overlapping ranges to each >> domain. > > Since all domains share the single resource, allocate_resource() "allocate > empty slot in the resource tree", giving non-overlapping ranges to each > devices. > > Just to confirm, I'm assuming I'll ask Linus to pull this code out of my > tile tree when the merge window opens, right? Would you like me to add > your name to the commit as acked or reviewed? Thanks! Yep, just ask Linus to pull it; I don't think there's any need to coordinate with my PCI tree since you're not using any interfaces we changed in this cycle. Reviewed-by: Bjorn Helgaas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
On 7/18/2012 12:50 PM, Bjorn Helgaas wrote: > On Wed, Jul 18, 2012 at 10:15 AM, Chris Metcalf wrote: >> On 7/13/2012 1:25 PM, Bjorn Helgaas wrote: >>> On Fri, Jul 13, 2012 at 11:52:11AM -0400, Chris Metcalf wrote: We use the same pci_iomem_resource for different domains or host bridges, but the MMIO apertures for each bridge do not overlap because non-overlapping resource ranges are allocated for each domains. >>> You should not use the same pci_iomem_resource for different host bridges >>> because that tells the PCI core that everything in pci_iomem_resource is >>> available for devices under every host bridge, which I doubt is the case. >>> >>> The fact that your firmware assigns non-overlapping resources is good and >>> works now, but if the kernel ever needs to allocate resources itself, >> Actually, we were not using any firmware. It was indeed the kernel which >> allocates resources from the shared pci_iomem_resource. > Wow. I wonder how that managed to work. Is there some information > that would have helped the PCI core do the right allocations? Or > maybe the host bridges forward everything they receive to PCI, > regardless of address, and any given MMIO address is only routed to > one of the host bridges because of the routing info in the page > tables? Since each host bridge contains non-overlapping ranges in its bridge config header, ioremap() locates the right host bridge for the target PCI resource address and programs the host bridge info into the MMIO mapping. The end result is the MMIO address is routed to the right host bridge. On Tile processors, different host bridges are like separate IO devices, in completely separate domains. > I guess in that case, the "apertures" would basically be > defined by the page tables, not by the host bridges. But that still > doesn't explain how we would assign non-overlapping ranges to each > domain. Since all domains share the single resource, allocate_resource() "allocate empty slot in the resource tree", giving non-overlapping ranges to each devices. Just to confirm, I'm assuming I'll ask Linus to pull this code out of my tile tree when the merge window opens, right? Would you like me to add your name to the commit as acked or reviewed? Thanks! -- Chris Metcalf, Tilera Corp. http://www.tilera.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
On Wed, Jul 18, 2012 at 10:15 AM, Chris Metcalf wrote: > On 7/13/2012 1:25 PM, Bjorn Helgaas wrote: >> On Fri, Jul 13, 2012 at 11:52:11AM -0400, Chris Metcalf wrote: >>> On 6/22/2012 7:24 AM, Bjorn Helgaas wrote: This says that your entire physical address space (currently 0x0-0x_) is routed to the PCI bus, which is not true. I think what you want here is pci_iomem_resource, but I'm not sure that's set up correctly. It should contain the CPU physical address that are routed to the PCI bus. Since you mention an offset, the PCI bus addresses will "CPU physical address - offset". >>> Yes, we've changed it to use pci_iomem_resource. On TILE-Gx, there are two >>> types of CPU physical addresses: physical RAM addresses and MMIO addresses. >>> The MMIO address has the MMIO attribute in the page table. So, the physical >>> address spaces for the RAM and the PCI are completely separate. Instead, we >>> have the following relationship: PCI bus address = PCI resource address - >>> offset, where the PCI resource addresses are defined by pci_iomem_resource >>> and they are never generated by the CPU. >> Does that mean the MMIO addresses are not accessible when the CPU >> is in physical mode, and you can only reach them via a virtual address >> mapped with the MMIO attribute? If so, then I guess you're basically >> combining RAM addresses and MMIO addresses into iomem_resource by >> using high "address bits" to represent the MMIO attribute? > > Yes. > >>> The TILE-Gx chip’s CHIP_PA_WIDTH is 40-bit. In the following example, the >>> system has 32GB RAM installed, with 16GB in each of the 2 memory >>> controllers. For the first mvsas device, its PCI memory resource is >>> [0x100c000, 0x100c003], the corresponding PCI bus address range is >>> [0xc000, 0xc003] after subtracting the offset of (1ul << 40). The >>> aforementioned PCI MMIO address’s low 32-bits contains the PCI bus address. >>> >>> # cat /proc/iomem >>> -3fbff : System RAM >>> -007eeb1f : Kernel code >>> 0086-00af6e4b : Kernel data >>> 40-43 : System RAM >>> 100c000-100c003 : mvsas >>> 100c004-100c005 : mvsas >>> 100c020-100c0203fff : sky2 >>> 100c030-100c0303fff : sata_sil24 >>> 100c0304000-100c030407f : sata_sil24 >>> 100c040-100c0403fff : sky2 >>> >>> Note that in above example, the 2 mvsas devices are in a separate PCI >>> domain than the other 4 devices. >> It sounds like you're describing something like this: >> >> host bridge 0 >> resource [mem 0x100_c000-0x100_c00f] (offset 0x100_) >> bus addr [mem 0xc000-0xc00f] >> host bridge 2 >> resource [mem 0x100_c020-0x100_c02f] (offset 0x100_) >> bus addr [mem 0xc020-0xc02f] >> host bridge 3 >> resource [mem 0x100_c030-0x100_c03f] (offset 0x100_) >> bus addr [mem 0xc030-0xc03f] >> >> If PCI bus addresses are simply the low 32 bits of the MMIO address, >> there's nothing in the PCI core that should prevent you from giving a >> full 4GB of bus address space to each bridge, e.g.: >> >> host bridge 0 >> resource [mem 0x100_-0x100_] (offset 0x100_) >> bus addr [mem 0x-0x] >> host bridge 2 >> resource [mem 0x102_-0x102_] (offset 0x102_) >> bus addr [mem 0x-0x] >> host bridge 3 >> resource [mem 0x103_-0x103_] (offset 0x103_) >> bus addr [mem 0x-0x] > > Good idea! But we can’t use all the low addresses, i.e. a 4GB BAR window > won’t work because we must leave some space, i.e. the low 3GB in our case, > to allow the 32-bit devices access to the RAM. If the low 32-bit space is > all used for BAR, the host bridge won’t pass any DMA traffic to and from > the low 4GB RAM. We are going to use a separate MMIO range in [3GB, 4GB – > 1] for each host bridge, with offset 0x10N_ (see appended revised > patch). OK. Interesting that the PIO (coming from CPU) and DMA (coming from device) address spaces interact in this way. >>> We use the same pci_iomem_resource for different domains or host >>> bridges, but the MMIO apertures for each bridge do not overlap because >>> non-overlapping resource ranges are allocated for each domains. >> You should not use the same pci_iomem_resource for different host bridges >> because that tells the PCI core that everything in pci_iomem_resource is >> available for devices under every host bridge, which I doubt is the case. >> >> The fact that your firmware assigns non-overlapping resources is good and >> works now, but if the kernel ever needs to allocate resources itself, > > Actually, we were not using any firmware. It was indeed the kernel which > allocates resources from the shared pci_iomem_resource. Wow. I wonder how that managed to work. Is there some information that would have helped the PCI core do
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
On 7/13/2012 1:25 PM, Bjorn Helgaas wrote: > On Fri, Jul 13, 2012 at 11:52:11AM -0400, Chris Metcalf wrote: >> On 6/22/2012 7:24 AM, Bjorn Helgaas wrote: >>> This says that your entire physical address space (currently >>> 0x0-0x_) is routed to the PCI bus, which is not true. I >>> think what you want here is pci_iomem_resource, but I'm not sure that's >>> set up correctly. It should contain the CPU physical address that are >>> routed to the PCI bus. Since you mention an offset, the PCI bus >>> addresses will "CPU physical address - offset". >> Yes, we've changed it to use pci_iomem_resource. On TILE-Gx, there are two >> types of CPU physical addresses: physical RAM addresses and MMIO addresses. >> The MMIO address has the MMIO attribute in the page table. So, the physical >> address spaces for the RAM and the PCI are completely separate. Instead, we >> have the following relationship: PCI bus address = PCI resource address - >> offset, where the PCI resource addresses are defined by pci_iomem_resource >> and they are never generated by the CPU. > Does that mean the MMIO addresses are not accessible when the CPU > is in physical mode, and you can only reach them via a virtual address > mapped with the MMIO attribute? If so, then I guess you're basically > combining RAM addresses and MMIO addresses into iomem_resource by > using high "address bits" to represent the MMIO attribute? Yes. >> The TILE-Gx chip’s CHIP_PA_WIDTH is 40-bit. In the following example, the >> system has 32GB RAM installed, with 16GB in each of the 2 memory >> controllers. For the first mvsas device, its PCI memory resource is >> [0x100c000, 0x100c003], the corresponding PCI bus address range is >> [0xc000, 0xc003] after subtracting the offset of (1ul << 40). The >> aforementioned PCI MMIO address’s low 32-bits contains the PCI bus address. >> >> # cat /proc/iomem >> -3fbff : System RAM >> -007eeb1f : Kernel code >> 0086-00af6e4b : Kernel data >> 40-43 : System RAM >> 100c000-100c003 : mvsas >> 100c004-100c005 : mvsas >> 100c020-100c0203fff : sky2 >> 100c030-100c0303fff : sata_sil24 >> 100c0304000-100c030407f : sata_sil24 >> 100c040-100c0403fff : sky2 >> >> Note that in above example, the 2 mvsas devices are in a separate PCI >> domain than the other 4 devices. > It sounds like you're describing something like this: > > host bridge 0 > resource [mem 0x100_c000-0x100_c00f] (offset 0x100_) > bus addr [mem 0xc000-0xc00f] > host bridge 2 > resource [mem 0x100_c020-0x100_c02f] (offset 0x100_) > bus addr [mem 0xc020-0xc02f] > host bridge 3 > resource [mem 0x100_c030-0x100_c03f] (offset 0x100_) > bus addr [mem 0xc030-0xc03f] > > If PCI bus addresses are simply the low 32 bits of the MMIO address, > there's nothing in the PCI core that should prevent you from giving a > full 4GB of bus address space to each bridge, e.g.: > > host bridge 0 > resource [mem 0x100_-0x100_] (offset 0x100_) > bus addr [mem 0x-0x] > host bridge 2 > resource [mem 0x102_-0x102_] (offset 0x102_) > bus addr [mem 0x-0x] > host bridge 3 > resource [mem 0x103_-0x103_] (offset 0x103_) > bus addr [mem 0x-0x] Good idea! But we can’t use all the low addresses, i.e. a 4GB BAR window won’t work because we must leave some space, i.e. the low 3GB in our case, to allow the 32-bit devices access to the RAM. If the low 32-bit space is all used for BAR, the host bridge won’t pass any DMA traffic to and from the low 4GB RAM. We are going to use a separate MMIO range in [3GB, 4GB – 1] for each host bridge, with offset 0x10N_ (see appended revised patch). >> We use the same pci_iomem_resource for different domains or host >> bridges, but the MMIO apertures for each bridge do not overlap because >> non-overlapping resource ranges are allocated for each domains. > You should not use the same pci_iomem_resource for different host bridges > because that tells the PCI core that everything in pci_iomem_resource is > available for devices under every host bridge, which I doubt is the case. > > The fact that your firmware assigns non-overlapping resources is good and > works now, but if the kernel ever needs to allocate resources itself, Actually, we were not using any firmware. It was indeed the kernel which allocates resources from the shared pci_iomem_resource. > the only way to do it correctly is to know what the actual apertures are > for each host bridge. Eventually, I think the host bridges will also > show up in /proc/iomem, which won't work if their apertures overlap. Fixed. Thanks! diff --git a/arch/tile/include/asm/pci.h b/arch/tile/include/asm/pci.h index 553b7ff..1ab2a58 100644 --- a/arch/tile/include/asm/pci.h +++
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
On 7/13/2012 1:25 PM, Bjorn Helgaas wrote: On Fri, Jul 13, 2012 at 11:52:11AM -0400, Chris Metcalf wrote: On 6/22/2012 7:24 AM, Bjorn Helgaas wrote: This says that your entire physical address space (currently 0x0-0x_) is routed to the PCI bus, which is not true. I think what you want here is pci_iomem_resource, but I'm not sure that's set up correctly. It should contain the CPU physical address that are routed to the PCI bus. Since you mention an offset, the PCI bus addresses will CPU physical address - offset. Yes, we've changed it to use pci_iomem_resource. On TILE-Gx, there are two types of CPU physical addresses: physical RAM addresses and MMIO addresses. The MMIO address has the MMIO attribute in the page table. So, the physical address spaces for the RAM and the PCI are completely separate. Instead, we have the following relationship: PCI bus address = PCI resource address - offset, where the PCI resource addresses are defined by pci_iomem_resource and they are never generated by the CPU. Does that mean the MMIO addresses are not accessible when the CPU is in physical mode, and you can only reach them via a virtual address mapped with the MMIO attribute? If so, then I guess you're basically combining RAM addresses and MMIO addresses into iomem_resource by using high address bits to represent the MMIO attribute? Yes. The TILE-Gx chip’s CHIP_PA_WIDTH is 40-bit. In the following example, the system has 32GB RAM installed, with 16GB in each of the 2 memory controllers. For the first mvsas device, its PCI memory resource is [0x100c000, 0x100c003], the corresponding PCI bus address range is [0xc000, 0xc003] after subtracting the offset of (1ul 40). The aforementioned PCI MMIO address’s low 32-bits contains the PCI bus address. # cat /proc/iomem -3fbff : System RAM -007eeb1f : Kernel code 0086-00af6e4b : Kernel data 40-43 : System RAM 100c000-100c003 : mvsas 100c004-100c005 : mvsas 100c020-100c0203fff : sky2 100c030-100c0303fff : sata_sil24 100c0304000-100c030407f : sata_sil24 100c040-100c0403fff : sky2 Note that in above example, the 2 mvsas devices are in a separate PCI domain than the other 4 devices. It sounds like you're describing something like this: host bridge 0 resource [mem 0x100_c000-0x100_c00f] (offset 0x100_) bus addr [mem 0xc000-0xc00f] host bridge 2 resource [mem 0x100_c020-0x100_c02f] (offset 0x100_) bus addr [mem 0xc020-0xc02f] host bridge 3 resource [mem 0x100_c030-0x100_c03f] (offset 0x100_) bus addr [mem 0xc030-0xc03f] If PCI bus addresses are simply the low 32 bits of the MMIO address, there's nothing in the PCI core that should prevent you from giving a full 4GB of bus address space to each bridge, e.g.: host bridge 0 resource [mem 0x100_-0x100_] (offset 0x100_) bus addr [mem 0x-0x] host bridge 2 resource [mem 0x102_-0x102_] (offset 0x102_) bus addr [mem 0x-0x] host bridge 3 resource [mem 0x103_-0x103_] (offset 0x103_) bus addr [mem 0x-0x] Good idea! But we can’t use all the low addresses, i.e. a 4GB BAR window won’t work because we must leave some space, i.e. the low 3GB in our case, to allow the 32-bit devices access to the RAM. If the low 32-bit space is all used for BAR, the host bridge won’t pass any DMA traffic to and from the low 4GB RAM. We are going to use a separate MMIO range in [3GB, 4GB – 1] for each host bridge, with offset 0x10N_ (see appended revised patch). We use the same pci_iomem_resource for different domains or host bridges, but the MMIO apertures for each bridge do not overlap because non-overlapping resource ranges are allocated for each domains. You should not use the same pci_iomem_resource for different host bridges because that tells the PCI core that everything in pci_iomem_resource is available for devices under every host bridge, which I doubt is the case. The fact that your firmware assigns non-overlapping resources is good and works now, but if the kernel ever needs to allocate resources itself, Actually, we were not using any firmware. It was indeed the kernel which allocates resources from the shared pci_iomem_resource. the only way to do it correctly is to know what the actual apertures are for each host bridge. Eventually, I think the host bridges will also show up in /proc/iomem, which won't work if their apertures overlap. Fixed. Thanks! diff --git a/arch/tile/include/asm/pci.h b/arch/tile/include/asm/pci.h index 553b7ff..1ab2a58 100644 --- a/arch/tile/include/asm/pci.h +++ b/arch/tile/include/asm/pci.h @@ -128,15 +128,10 @@ static inline void pci_iounmap(struct pci_dev *dev, void __iomem *addr) {}
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
On Wed, Jul 18, 2012 at 10:15 AM, Chris Metcalf cmetc...@tilera.com wrote: On 7/13/2012 1:25 PM, Bjorn Helgaas wrote: On Fri, Jul 13, 2012 at 11:52:11AM -0400, Chris Metcalf wrote: On 6/22/2012 7:24 AM, Bjorn Helgaas wrote: This says that your entire physical address space (currently 0x0-0x_) is routed to the PCI bus, which is not true. I think what you want here is pci_iomem_resource, but I'm not sure that's set up correctly. It should contain the CPU physical address that are routed to the PCI bus. Since you mention an offset, the PCI bus addresses will CPU physical address - offset. Yes, we've changed it to use pci_iomem_resource. On TILE-Gx, there are two types of CPU physical addresses: physical RAM addresses and MMIO addresses. The MMIO address has the MMIO attribute in the page table. So, the physical address spaces for the RAM and the PCI are completely separate. Instead, we have the following relationship: PCI bus address = PCI resource address - offset, where the PCI resource addresses are defined by pci_iomem_resource and they are never generated by the CPU. Does that mean the MMIO addresses are not accessible when the CPU is in physical mode, and you can only reach them via a virtual address mapped with the MMIO attribute? If so, then I guess you're basically combining RAM addresses and MMIO addresses into iomem_resource by using high address bits to represent the MMIO attribute? Yes. The TILE-Gx chip’s CHIP_PA_WIDTH is 40-bit. In the following example, the system has 32GB RAM installed, with 16GB in each of the 2 memory controllers. For the first mvsas device, its PCI memory resource is [0x100c000, 0x100c003], the corresponding PCI bus address range is [0xc000, 0xc003] after subtracting the offset of (1ul 40). The aforementioned PCI MMIO address’s low 32-bits contains the PCI bus address. # cat /proc/iomem -3fbff : System RAM -007eeb1f : Kernel code 0086-00af6e4b : Kernel data 40-43 : System RAM 100c000-100c003 : mvsas 100c004-100c005 : mvsas 100c020-100c0203fff : sky2 100c030-100c0303fff : sata_sil24 100c0304000-100c030407f : sata_sil24 100c040-100c0403fff : sky2 Note that in above example, the 2 mvsas devices are in a separate PCI domain than the other 4 devices. It sounds like you're describing something like this: host bridge 0 resource [mem 0x100_c000-0x100_c00f] (offset 0x100_) bus addr [mem 0xc000-0xc00f] host bridge 2 resource [mem 0x100_c020-0x100_c02f] (offset 0x100_) bus addr [mem 0xc020-0xc02f] host bridge 3 resource [mem 0x100_c030-0x100_c03f] (offset 0x100_) bus addr [mem 0xc030-0xc03f] If PCI bus addresses are simply the low 32 bits of the MMIO address, there's nothing in the PCI core that should prevent you from giving a full 4GB of bus address space to each bridge, e.g.: host bridge 0 resource [mem 0x100_-0x100_] (offset 0x100_) bus addr [mem 0x-0x] host bridge 2 resource [mem 0x102_-0x102_] (offset 0x102_) bus addr [mem 0x-0x] host bridge 3 resource [mem 0x103_-0x103_] (offset 0x103_) bus addr [mem 0x-0x] Good idea! But we can’t use all the low addresses, i.e. a 4GB BAR window won’t work because we must leave some space, i.e. the low 3GB in our case, to allow the 32-bit devices access to the RAM. If the low 32-bit space is all used for BAR, the host bridge won’t pass any DMA traffic to and from the low 4GB RAM. We are going to use a separate MMIO range in [3GB, 4GB – 1] for each host bridge, with offset 0x10N_ (see appended revised patch). OK. Interesting that the PIO (coming from CPU) and DMA (coming from device) address spaces interact in this way. We use the same pci_iomem_resource for different domains or host bridges, but the MMIO apertures for each bridge do not overlap because non-overlapping resource ranges are allocated for each domains. You should not use the same pci_iomem_resource for different host bridges because that tells the PCI core that everything in pci_iomem_resource is available for devices under every host bridge, which I doubt is the case. The fact that your firmware assigns non-overlapping resources is good and works now, but if the kernel ever needs to allocate resources itself, Actually, we were not using any firmware. It was indeed the kernel which allocates resources from the shared pci_iomem_resource. Wow. I wonder how that managed to work. Is there some information that would have helped the PCI core do the right allocations? Or maybe the host bridges forward everything they receive to PCI, regardless of address, and any given MMIO address is only routed to one of the host bridges because of
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
On 7/18/2012 12:50 PM, Bjorn Helgaas wrote: On Wed, Jul 18, 2012 at 10:15 AM, Chris Metcalf cmetc...@tilera.com wrote: On 7/13/2012 1:25 PM, Bjorn Helgaas wrote: On Fri, Jul 13, 2012 at 11:52:11AM -0400, Chris Metcalf wrote: We use the same pci_iomem_resource for different domains or host bridges, but the MMIO apertures for each bridge do not overlap because non-overlapping resource ranges are allocated for each domains. You should not use the same pci_iomem_resource for different host bridges because that tells the PCI core that everything in pci_iomem_resource is available for devices under every host bridge, which I doubt is the case. The fact that your firmware assigns non-overlapping resources is good and works now, but if the kernel ever needs to allocate resources itself, Actually, we were not using any firmware. It was indeed the kernel which allocates resources from the shared pci_iomem_resource. Wow. I wonder how that managed to work. Is there some information that would have helped the PCI core do the right allocations? Or maybe the host bridges forward everything they receive to PCI, regardless of address, and any given MMIO address is only routed to one of the host bridges because of the routing info in the page tables? Since each host bridge contains non-overlapping ranges in its bridge config header, ioremap() locates the right host bridge for the target PCI resource address and programs the host bridge info into the MMIO mapping. The end result is the MMIO address is routed to the right host bridge. On Tile processors, different host bridges are like separate IO devices, in completely separate domains. I guess in that case, the apertures would basically be defined by the page tables, not by the host bridges. But that still doesn't explain how we would assign non-overlapping ranges to each domain. Since all domains share the single resource, allocate_resource() allocate empty slot in the resource tree, giving non-overlapping ranges to each devices. Just to confirm, I'm assuming I'll ask Linus to pull this code out of my tile tree when the merge window opens, right? Would you like me to add your name to the commit as acked or reviewed? Thanks! -- Chris Metcalf, Tilera Corp. http://www.tilera.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
On Wed, Jul 18, 2012 at 2:10 PM, Chris Metcalf cmetc...@tilera.com wrote: On 7/18/2012 12:50 PM, Bjorn Helgaas wrote: On Wed, Jul 18, 2012 at 10:15 AM, Chris Metcalf cmetc...@tilera.com wrote: On 7/13/2012 1:25 PM, Bjorn Helgaas wrote: On Fri, Jul 13, 2012 at 11:52:11AM -0400, Chris Metcalf wrote: We use the same pci_iomem_resource for different domains or host bridges, but the MMIO apertures for each bridge do not overlap because non-overlapping resource ranges are allocated for each domains. You should not use the same pci_iomem_resource for different host bridges because that tells the PCI core that everything in pci_iomem_resource is available for devices under every host bridge, which I doubt is the case. The fact that your firmware assigns non-overlapping resources is good and works now, but if the kernel ever needs to allocate resources itself, Actually, we were not using any firmware. It was indeed the kernel which allocates resources from the shared pci_iomem_resource. Wow. I wonder how that managed to work. Is there some information that would have helped the PCI core do the right allocations? Or maybe the host bridges forward everything they receive to PCI, regardless of address, and any given MMIO address is only routed to one of the host bridges because of the routing info in the page tables? Since each host bridge contains non-overlapping ranges in its bridge config header, ioremap() locates the right host bridge for the target PCI resource address and programs the host bridge info into the MMIO mapping. The end result is the MMIO address is routed to the right host bridge. On Tile processors, different host bridges are like separate IO devices, in completely separate domains. I guess in that case, the apertures would basically be defined by the page tables, not by the host bridges. But that still doesn't explain how we would assign non-overlapping ranges to each domain. Since all domains share the single resource, allocate_resource() allocate empty slot in the resource tree, giving non-overlapping ranges to each devices. Just to confirm, I'm assuming I'll ask Linus to pull this code out of my tile tree when the merge window opens, right? Would you like me to add your name to the commit as acked or reviewed? Thanks! Yep, just ask Linus to pull it; I don't think there's any need to coordinate with my PCI tree since you're not using any interfaces we changed in this cycle. Reviewed-by: Bjorn Helgaas bhelg...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
On Fri, Jul 13, 2012 at 11:52:11AM -0400, Chris Metcalf wrote: > Sorry for the slow reply to your feedback; I had to coordinate with our > primary PCI developer (in another timezone) and we both had various > unrelated fires to fight along the way. > > I've appended the patch that corrects all the issues you reported. Bjorn, > I'm assuming that it's appropriate for me to push this change through the > tile tree (along with all the infrastructural changes to support the > TILE-Gx TRIO shim that implements PCIe for our chip) rather than breaking > it out to push it through the pci tree; does that sound correct to you? > > On 6/22/2012 7:24 AM, Bjorn Helgaas wrote: > > On Fri, Jun 15, 2012 at 1:23 PM, Chris Metcalf wrote: > >> This change uses the TRIO IOMMU to map the PCI DMA space and physical > >> memory at different addresses. We also now use the dma_mapping_ops > >> to provide support for non-PCI DMA, PCIe DMA (64-bit) and legacy PCI > >> DMA (32-bit). We use the kernel's software I/O TLB framework > >> (i.e. bounce buffers) for the legacy 32-bit PCI device support since > >> there are a limited number of TLB entries in the IOMMU and it is > >> non-trivial to handle indexing, searching, matching, etc. For 32-bit > >> devices the performance impact of bounce buffers should not be a concern. > >> > >> > >> +extern void > >> +pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region > >> *region, > >> + struct resource *res); > >> + > >> +extern void > >> +pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res, > >> + struct pci_bus_region *region); > > These extern declarations look like leftovers that shouldn't be needed. > > Thanks. Removed. > > >> +/* PCI I/O space support is not implemented. */ > >> +static struct resource pci_ioport_resource = { > >> + .name = "PCI IO", > >> + .start = 0, > >> + .end= 0, > >> + .flags = IORESOURCE_IO, > >> +}; > > You don't need to define pci_ioport_resource at all if you don't > > support I/O space. > > We have some internal changes to support I/O space, but for now I've gone > ahead and removed pci_ioport_resource. > > >> + /* > >> +* The PCI memory resource is located above the PA space. > >> +* The memory range for the PCI root bus should not overlap > >> +* with the physical RAM > >> +*/ > >> + pci_add_resource_offset(, _resource, > >> + 1ULL << CHIP_PA_WIDTH()); > > This says that your entire physical address space (currently > > 0x0-0x_) is routed to the PCI bus, which is not true. > > I think what you want here is pci_iomem_resource, but I'm not sure > > that's set up correctly. It should contain the CPU physical address > > that are routed to the PCI bus. Since you mention an offset, the PCI > > bus addresses will "CPU physical address - offset". > > Yes, we've changed it to use pci_iomem_resource. On TILE-Gx, there are two > types of CPU physical addresses: physical RAM addresses and MMIO addresses. > The MMIO address has the MMIO attribute in the page table. So, the physical > address spaces for the RAM and the PCI are completely separate. Instead, we > have the following relationship: PCI bus address = PCI resource address - > offset, where the PCI resource addresses are defined by pci_iomem_resource > and they are never generated by the CPU. Does that mean the MMIO addresses are not accessible when the CPU is in physical mode, and you can only reach them via a virtual address mapped with the MMIO attribute? If so, then I guess you're basically combining RAM addresses and MMIO addresses into iomem_resource by using high "address bits" to represent the MMIO attribute? > > I don't understand the CHIP_PA_WIDTH() usage -- that seems to be the > > physical address width, but you define TILE_PCI_MEM_END as "((1ULL << > > CHIP_PA_WIDTH()) + TILE_PCI_BAR_WINDOW_TOP)", which would mean the CPU > > could never generate that address. > > Exactly. The CPU-generated physical addresses for the PCI space, i.e. the > MMIO addresses, have an address format that is defined by the RC > controller. They go to the RC controller directly, because the page table > entry also encodes the RC controller’s location on the chip. > > > I might understand this better if you could give a concrete example of > > the CPU address range and the corresponding PCI bus address range. > > For example, I have a box where CPU physical address range [mem > > 0xf00-0xf007edf] is routed to PCI bus address range > > [0x8000-0xfedf]. In this case, the struct resource contains > > 0xf00-0xf007edf, and the offset is 0xf00 - > > 0x8000 or 0xeff8000. > > The TILE-Gx chip’s CHIP_PA_WIDTH is 40-bit. In the following example, the > system has 32GB RAM installed, with 16GB in each of the 2 memory >
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
Sorry for the slow reply to your feedback; I had to coordinate with our primary PCI developer (in another timezone) and we both had various unrelated fires to fight along the way. I've appended the patch that corrects all the issues you reported. Bjorn, I'm assuming that it's appropriate for me to push this change through the tile tree (along with all the infrastructural changes to support the TILE-Gx TRIO shim that implements PCIe for our chip) rather than breaking it out to push it through the pci tree; does that sound correct to you? On 6/22/2012 7:24 AM, Bjorn Helgaas wrote: > On Fri, Jun 15, 2012 at 1:23 PM, Chris Metcalf wrote: >> This change uses the TRIO IOMMU to map the PCI DMA space and physical >> memory at different addresses. We also now use the dma_mapping_ops >> to provide support for non-PCI DMA, PCIe DMA (64-bit) and legacy PCI >> DMA (32-bit). We use the kernel's software I/O TLB framework >> (i.e. bounce buffers) for the legacy 32-bit PCI device support since >> there are a limited number of TLB entries in the IOMMU and it is >> non-trivial to handle indexing, searching, matching, etc. For 32-bit >> devices the performance impact of bounce buffers should not be a concern. >> >> >> +extern void >> +pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region, >> + struct resource *res); >> + >> +extern void >> +pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res, >> + struct pci_bus_region *region); > These extern declarations look like leftovers that shouldn't be needed. Thanks. Removed. >> +/* PCI I/O space support is not implemented. */ >> +static struct resource pci_ioport_resource = { >> + .name = "PCI IO", >> + .start = 0, >> + .end= 0, >> + .flags = IORESOURCE_IO, >> +}; > You don't need to define pci_ioport_resource at all if you don't > support I/O space. We have some internal changes to support I/O space, but for now I've gone ahead and removed pci_ioport_resource. >> + /* >> +* The PCI memory resource is located above the PA space. >> +* The memory range for the PCI root bus should not overlap >> +* with the physical RAM >> +*/ >> + pci_add_resource_offset(, _resource, >> + 1ULL << CHIP_PA_WIDTH()); > This says that your entire physical address space (currently > 0x0-0x_) is routed to the PCI bus, which is not true. > I think what you want here is pci_iomem_resource, but I'm not sure > that's set up correctly. It should contain the CPU physical address > that are routed to the PCI bus. Since you mention an offset, the PCI > bus addresses will "CPU physical address - offset". Yes, we've changed it to use pci_iomem_resource. On TILE-Gx, there are two types of CPU physical addresses: physical RAM addresses and MMIO addresses. The MMIO address has the MMIO attribute in the page table. So, the physical address spaces for the RAM and the PCI are completely separate. Instead, we have the following relationship: PCI bus address = PCI resource address - offset, where the PCI resource addresses are defined by pci_iomem_resource and they are never generated by the CPU. > I don't understand the CHIP_PA_WIDTH() usage -- that seems to be the > physical address width, but you define TILE_PCI_MEM_END as "((1ULL << > CHIP_PA_WIDTH()) + TILE_PCI_BAR_WINDOW_TOP)", which would mean the CPU > could never generate that address. Exactly. The CPU-generated physical addresses for the PCI space, i.e. the MMIO addresses, have an address format that is defined by the RC controller. They go to the RC controller directly, because the page table entry also encodes the RC controller’s location on the chip. > I might understand this better if you could give a concrete example of > the CPU address range and the corresponding PCI bus address range. > For example, I have a box where CPU physical address range [mem > 0xf00-0xf007edf] is routed to PCI bus address range > [0x8000-0xfedf]. In this case, the struct resource contains > 0xf00-0xf007edf, and the offset is 0xf00 - > 0x8000 or 0xeff8000. The TILE-Gx chip’s CHIP_PA_WIDTH is 40-bit. In the following example, the system has 32GB RAM installed, with 16GB in each of the 2 memory controllers. For the first mvsas device, its PCI memory resource is [0x100c000, 0x100c003], the corresponding PCI bus address range is [0xc000, 0xc003] after subtracting the offset of (1ul << 40). The aforementioned PCI MMIO address’s low 32-bits contains the PCI bus address. # cat /proc/iomem -3fbff : System RAM -007eeb1f : Kernel code 0086-00af6e4b : Kernel data 40-43 : System RAM 100c000-100c003 : mvsas 100c004-100c005 : mvsas 100c020-100c0203fff : sky2 100c030-100c0303fff : sata_sil24
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
Sorry for the slow reply to your feedback; I had to coordinate with our primary PCI developer (in another timezone) and we both had various unrelated fires to fight along the way. I've appended the patch that corrects all the issues you reported. Bjorn, I'm assuming that it's appropriate for me to push this change through the tile tree (along with all the infrastructural changes to support the TILE-Gx TRIO shim that implements PCIe for our chip) rather than breaking it out to push it through the pci tree; does that sound correct to you? On 6/22/2012 7:24 AM, Bjorn Helgaas wrote: On Fri, Jun 15, 2012 at 1:23 PM, Chris Metcalf cmetc...@tilera.com wrote: This change uses the TRIO IOMMU to map the PCI DMA space and physical memory at different addresses. We also now use the dma_mapping_ops to provide support for non-PCI DMA, PCIe DMA (64-bit) and legacy PCI DMA (32-bit). We use the kernel's software I/O TLB framework (i.e. bounce buffers) for the legacy 32-bit PCI device support since there are a limited number of TLB entries in the IOMMU and it is non-trivial to handle indexing, searching, matching, etc. For 32-bit devices the performance impact of bounce buffers should not be a concern. +extern void +pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region, + struct resource *res); + +extern void +pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res, + struct pci_bus_region *region); These extern declarations look like leftovers that shouldn't be needed. Thanks. Removed. +/* PCI I/O space support is not implemented. */ +static struct resource pci_ioport_resource = { + .name = PCI IO, + .start = 0, + .end= 0, + .flags = IORESOURCE_IO, +}; You don't need to define pci_ioport_resource at all if you don't support I/O space. We have some internal changes to support I/O space, but for now I've gone ahead and removed pci_ioport_resource. + /* +* The PCI memory resource is located above the PA space. +* The memory range for the PCI root bus should not overlap +* with the physical RAM +*/ + pci_add_resource_offset(resources, iomem_resource, + 1ULL CHIP_PA_WIDTH()); This says that your entire physical address space (currently 0x0-0x_) is routed to the PCI bus, which is not true. I think what you want here is pci_iomem_resource, but I'm not sure that's set up correctly. It should contain the CPU physical address that are routed to the PCI bus. Since you mention an offset, the PCI bus addresses will CPU physical address - offset. Yes, we've changed it to use pci_iomem_resource. On TILE-Gx, there are two types of CPU physical addresses: physical RAM addresses and MMIO addresses. The MMIO address has the MMIO attribute in the page table. So, the physical address spaces for the RAM and the PCI are completely separate. Instead, we have the following relationship: PCI bus address = PCI resource address - offset, where the PCI resource addresses are defined by pci_iomem_resource and they are never generated by the CPU. I don't understand the CHIP_PA_WIDTH() usage -- that seems to be the physical address width, but you define TILE_PCI_MEM_END as ((1ULL CHIP_PA_WIDTH()) + TILE_PCI_BAR_WINDOW_TOP), which would mean the CPU could never generate that address. Exactly. The CPU-generated physical addresses for the PCI space, i.e. the MMIO addresses, have an address format that is defined by the RC controller. They go to the RC controller directly, because the page table entry also encodes the RC controller’s location on the chip. I might understand this better if you could give a concrete example of the CPU address range and the corresponding PCI bus address range. For example, I have a box where CPU physical address range [mem 0xf00-0xf007edf] is routed to PCI bus address range [0x8000-0xfedf]. In this case, the struct resource contains 0xf00-0xf007edf, and the offset is 0xf00 - 0x8000 or 0xeff8000. The TILE-Gx chip’s CHIP_PA_WIDTH is 40-bit. In the following example, the system has 32GB RAM installed, with 16GB in each of the 2 memory controllers. For the first mvsas device, its PCI memory resource is [0x100c000, 0x100c003], the corresponding PCI bus address range is [0xc000, 0xc003] after subtracting the offset of (1ul 40). The aforementioned PCI MMIO address’s low 32-bits contains the PCI bus address. # cat /proc/iomem -3fbff : System RAM -007eeb1f : Kernel code 0086-00af6e4b : Kernel data 40-43 : System RAM 100c000-100c003 : mvsas 100c004-100c005 : mvsas 100c020-100c0203fff : sky2 100c030-100c0303fff : sata_sil24 100c0304000-100c030407f : sata_sil24 100c040-100c0403fff : sky2
Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices
On Fri, Jul 13, 2012 at 11:52:11AM -0400, Chris Metcalf wrote: Sorry for the slow reply to your feedback; I had to coordinate with our primary PCI developer (in another timezone) and we both had various unrelated fires to fight along the way. I've appended the patch that corrects all the issues you reported. Bjorn, I'm assuming that it's appropriate for me to push this change through the tile tree (along with all the infrastructural changes to support the TILE-Gx TRIO shim that implements PCIe for our chip) rather than breaking it out to push it through the pci tree; does that sound correct to you? On 6/22/2012 7:24 AM, Bjorn Helgaas wrote: On Fri, Jun 15, 2012 at 1:23 PM, Chris Metcalf cmetc...@tilera.com wrote: This change uses the TRIO IOMMU to map the PCI DMA space and physical memory at different addresses. We also now use the dma_mapping_ops to provide support for non-PCI DMA, PCIe DMA (64-bit) and legacy PCI DMA (32-bit). We use the kernel's software I/O TLB framework (i.e. bounce buffers) for the legacy 32-bit PCI device support since there are a limited number of TLB entries in the IOMMU and it is non-trivial to handle indexing, searching, matching, etc. For 32-bit devices the performance impact of bounce buffers should not be a concern. +extern void +pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region, + struct resource *res); + +extern void +pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res, + struct pci_bus_region *region); These extern declarations look like leftovers that shouldn't be needed. Thanks. Removed. +/* PCI I/O space support is not implemented. */ +static struct resource pci_ioport_resource = { + .name = PCI IO, + .start = 0, + .end= 0, + .flags = IORESOURCE_IO, +}; You don't need to define pci_ioport_resource at all if you don't support I/O space. We have some internal changes to support I/O space, but for now I've gone ahead and removed pci_ioport_resource. + /* +* The PCI memory resource is located above the PA space. +* The memory range for the PCI root bus should not overlap +* with the physical RAM +*/ + pci_add_resource_offset(resources, iomem_resource, + 1ULL CHIP_PA_WIDTH()); This says that your entire physical address space (currently 0x0-0x_) is routed to the PCI bus, which is not true. I think what you want here is pci_iomem_resource, but I'm not sure that's set up correctly. It should contain the CPU physical address that are routed to the PCI bus. Since you mention an offset, the PCI bus addresses will CPU physical address - offset. Yes, we've changed it to use pci_iomem_resource. On TILE-Gx, there are two types of CPU physical addresses: physical RAM addresses and MMIO addresses. The MMIO address has the MMIO attribute in the page table. So, the physical address spaces for the RAM and the PCI are completely separate. Instead, we have the following relationship: PCI bus address = PCI resource address - offset, where the PCI resource addresses are defined by pci_iomem_resource and they are never generated by the CPU. Does that mean the MMIO addresses are not accessible when the CPU is in physical mode, and you can only reach them via a virtual address mapped with the MMIO attribute? If so, then I guess you're basically combining RAM addresses and MMIO addresses into iomem_resource by using high address bits to represent the MMIO attribute? I don't understand the CHIP_PA_WIDTH() usage -- that seems to be the physical address width, but you define TILE_PCI_MEM_END as ((1ULL CHIP_PA_WIDTH()) + TILE_PCI_BAR_WINDOW_TOP), which would mean the CPU could never generate that address. Exactly. The CPU-generated physical addresses for the PCI space, i.e. the MMIO addresses, have an address format that is defined by the RC controller. They go to the RC controller directly, because the page table entry also encodes the RC controller’s location on the chip. I might understand this better if you could give a concrete example of the CPU address range and the corresponding PCI bus address range. For example, I have a box where CPU physical address range [mem 0xf00-0xf007edf] is routed to PCI bus address range [0x8000-0xfedf]. In this case, the struct resource contains 0xf00-0xf007edf, and the offset is 0xf00 - 0x8000 or 0xeff8000. The TILE-Gx chip’s CHIP_PA_WIDTH is 40-bit. In the following example, the system has 32GB RAM installed, with 16GB in each of the 2 memory controllers. For the first mvsas device, its PCI memory resource is [0x100c000, 0x100c003], the corresponding PCI bus address range is [0xc000,