Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Thursday, December 29, 2016 11:45:03 PM CET Nikita Yushchenko wrote: > > static int __swiotlb_dma_supported(struct device *hwdev, u64 mask) > { > +#ifdef CONFIG_PCI > + if (dev_is_pci(hwdev)) { > + struct pci_dev *pdev = to_pci_dev(hwdev); > + struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus); > + > + if (br->dev.dma_mask && (*br->dev.dma_mask) && > + (mask & (*br->dev.dma_mask)) != mask) > + return 0; > + } > +#endif > if (swiotlb) > return swiotlb_dma_supported(hwdev, mask); > return 1; > I think it's wrong to make this a special case for PCI. Instead, we should follow the dma-ranges properties during dma_set_mask() to ensure we don't set a mask that any of the parents up to the root cannot support. Arnd
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Tuesday, January 10, 2017 3:44:53 PM CET Christoph Hellwig wrote: > On Tue, Jan 10, 2017 at 11:47:42AM +0100, Arnd Bergmann wrote: > > I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of > > 32-bit architectures without swiotlb (arc, arm, some mips32), and > > there are several 64-bit architectures that do not have swiotlb > > (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc > > always use some form of IOMMU, but the other four apparently don't, > > so we would need to add swiotlb support there to remove all the > > bounce buffering in network and block layers. > > mips has lots of weird swiotlb wire-up in it's board code (the swiotlb > arch glue really needs some major cleanup..), My reading of the MIPS code was that only the 64-bit platforms use it, but there are a number of 32-bit platforms that have 64-bit physical addresses and don't. > as does arm. Not sure about the others. 32-bit ARM doesn't actually use SWIOTLB at all, despite selecting it in Kconfig. I think Xen uses it for its own purposes, but nothing else does. Most ARM platforms can't actually have RAM beyond 4GB, and the ones that do have it tend to also come with an IOMMU, but I remember at least BCM53xx actually needing swiotlb on some chip revisions that are widely used and that cannot DMA to the second memory bank from PCI (!). Arnd
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Tue, Jan 10, 2017 at 11:47:42AM +0100, Arnd Bergmann wrote: > I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of > 32-bit architectures without swiotlb (arc, arm, some mips32), and > there are several 64-bit architectures that do not have swiotlb > (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc > always use some form of IOMMU, but the other four apparently don't, > so we would need to add swiotlb support there to remove all the > bounce buffering in network and block layers. mips has lots of weird swiotlb wire-up in it's board code (the swiotlb arch glue really needs some major cleanup..), as does arm. Not sure about the others. Getting rid of highmem bouncing in the block layer will take some time as various PIO-only drivers rely on it at the moment. These should all be convertable to kmap that data, but it needs a careful audit first. For 4.11 I'll plan to switch away from bouncing highmem by default at least, though and maybe also convert a few PIO drivers.
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Monday, January 9, 2017 9:57:46 PM CET Christoph Hellwig wrote: > > - architecture should stop breaking 64-bit DMA when driver attempts to > > set 64-bit dma mask, > > > > - NVMe should issue proper blk_queue_bounce_limit() call based on what > > is actually set mask, > > Or even better remove the call to dma_set_mask_and_coherent with > DMA_BIT_MASK(32). NVMe is designed around having proper 64-bit DMA > addressing, there is not point in trying to pretent it works without that Agreed, let's just fail the probe() if DMA_BIT_MASK(64) fails, and have swiotlb work around machines that for some reason need bounce buffers. > > - and blk_queue_bounce_limit() should also be fixed to actually set > > 0x limit, instead of replacing it with (max_low_pfn << > > PAGE_SHIFT) as it does now. > > We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to > mix the highmem aspect with the addressing limits. In fact the whole > block bouncing scheme doesn't make much sense at all these days, we > should rely on swiotlb instead. If we do this, we should probably have another look at the respective NETIF_F_HIGHDMA support in the network stack, which does the same thing and mixes up highmem on 32-bit architectures with the DMA address limit. (side note: there are actually cases in which you have a 31-bit DMA mask but 3 GB of lowmem using CONFIG_VMSPLIT_1G, so BLK_BOUNCE_HIGH and !NETIF_F_HIGHDMA are both missing the limit, causing data corruption without swiotlb). Before we rely too much on swiotlb, we may also need to consider which architectures today rely on bouncing in blk and network. I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of 32-bit architectures without swiotlb (arc, arm, some mips32), and there are several 64-bit architectures that do not have swiotlb (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc always use some form of IOMMU, but the other four apparently don't, so we would need to add swiotlb support there to remove all the bounce buffering in network and block layers. Arnd
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Mon, Jan 09, 2017 at 11:34:55PM +0300, Nikita Yushchenko wrote: > I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC > but in block layer, in particular it should be controlled by > blk_queue_bounce_limit(). [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it > is something completely different, namely it is for request merging for > hw not supporting scatter-gather]. And NVMe also uses block layer and > thus should get same support. NVMe shouldn't have to call blk_queue_bounce_limit - blk_queue_bounce_limit is to set the DMA addressing limit of the device. NVMe devices must support unlimited 64-bit addressing and thus calling blk_queue_bounce_limit from NVMe does not make sense. That being said currently the default for a queue without a call to blk_queue_make_request which does the wrong thing on highmem setups, so we should fix it. In fact BLK_BOUNCE_HIGH as-is doesn't really make much sense these days as no driver should ever dereference pages passed to it directly. > Maybe fixing that, together with making NVMe use this API, could stop it > from issuing dma_map()s of addresses beyond mask. NVMe should never bounce, the fact that it currently possibly does for highmem pages is a bug. > As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this > macro in the kernel do is - *disable* bounce buffers in block layer if > PCI_DMA_BUS_IS_PHYS is zero. That's not ironic but the whole point of the macro (horrible name and the fact that it should be a dma_ops setting aside). > - architecture should stop breaking 64-bit DMA when driver attempts to > set 64-bit dma mask, > > - NVMe should issue proper blk_queue_bounce_limit() call based on what > is actually set mask, Or even better remove the call to dma_set_mask_and_coherent with DMA_BIT_MASK(32). NVMe is designed around having proper 64-bit DMA addressing, there is not point in trying to pretent it works without that > - and blk_queue_bounce_limit() should also be fixed to actually set > 0x limit, instead of replacing it with (max_low_pfn << > PAGE_SHIFT) as it does now. We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to mix the highmem aspect with the addressing limits. In fact the whole block bouncing scheme doesn't make much sense at all these days, we should rely on swiotlb instead. > What I mean is some API to allocate memory for use with streaming DMA in > such way that bounce buffers won't be needed. There are many cases when > at buffer allocation time, it is already known that buffer will be used > for DMA with particular device. Bounce buffers will still be needed > cases when no such information is available at allocation time, or when > there is no directly-DMAable memory available at allocation time. For block I/O that is never the case.
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
[CCing NVMe maintainers since we are discussion issues in that driver] >> With my patch applied and thus 32bit dma_mask set for NVMe device, I do >> see high addresses passed to dma_map_*() routines and handled by >> swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask() >> operation but silently replace mask behind the scene" is required for >> swiotlb to be used, does not match reality. > > See my point about drivers that don't implement bounce buffering. > Apparently NVMe is one of them, unlike the SATA/SCSI/MMC storage > drivers that do their own thing. I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC but in block layer, in particular it should be controlled by blk_queue_bounce_limit(). [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it is something completely different, namely it is for request merging for hw not supporting scatter-gather]. And NVMe also uses block layer and thus should get same support. But blk_queue_bounce_limit() is somewhat broken, it has very strange code under #if BITS_PER_LONG == 64 that makes setting max_addr to 0x not working if max_low_pfn is above 4G. Maybe fixing that, together with making NVMe use this API, could stop it from issuing dma_map()s of addresses beyond mask. > What I think happened here in chronological order is: > > - In the old days, 64-bit architectures tended to use an IOMMU > all the time to work around 32-bit limitations on DMA masters > - Some architectures had no IOMMU that fully solved this and the > dma-mapping API required drivers to set the right mask and check > the return code. If this failed, the driver needed to use its > own bounce buffers as network and scsi do. See also the > grossly misnamed "PCI_DMA_BUS_IS_PHYS" macro. > - As we never had support for bounce buffers in all drivers, and > early 64-bit Intel machines had no IOMMU, the swiotlb code was > introduced as a workaround, so we can use the IOMMU case without > driver specific bounce buffers everywhere > - As most of the important 64-bit architectures (x86, arm64, powerpc) > now always have either IOMMU or swiotlb enabled, drivers like > NVMe started relying on it, and no longer handle a dma_set_mask > failure properly. ... and architectures started to add to this breakage, not handling dma_set_mask() as documented. As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this macro in the kernel do is - *disable* bounce buffers in block layer if PCI_DMA_BUS_IS_PHYS is zero. Defining it to zero (as arm64 currently does) on system with memory above 4G makes all block drivers to depend on swiotlb (or iommu). Affected drivers are SCSI and IDE. > We may need to audit how drivers typically handle dma_set_mask() > failure. The NVMe driver in its current state will probably cause > silent data corruption when used on a 64-bit architecture that has > a 32-bit bus but neither swiotlb nor iommu enabled at runtime. With current code NVME causes system memory breakage even if swiotlb is there - because it's dma_set_mask_and_coherent(DMA_BIT_MASK(64)) call has effect of silent disable of swiotlb. > I would argue that the driver should be fixed to either refuse > working in that configuration to avoid data corruption, or that > it should implement bounce buffering like SCSI does. Difference from "SCSI" (actually - from block drivers that work) is in that dma_set_mask_and_coherent(DMA_BIT_MASK(64)) call: driver that does not do it works, driver that does it fails. Per documentation, driver *should* do it if it's hardware supports 64-bit dma, and platform *should* either fail this call, or ensure that 64-bit addresses can be dma_map()ed successfully. So what we have on arm64 is - drivers that follow documented procedure fail, drivers that don't follow it work, That's nonsense. > If we make it > simply not work, then your suggestion of making dma_set_mask() > fail will break your system in a different way. Proper fix should fix *both* architecture and NVMe. - architecture should stop breaking 64-bit DMA when driver attempts to set 64-bit dma mask, - NVMe should issue proper blk_queue_bounce_limit() call based on what is actually set mask, - and blk_queue_bounce_limit() should also be fixed to actually set 0x limit, instead of replacing it with (max_low_pfn << PAGE_SHIFT) as it does now. >> Still current code does not work, thus fix is needed. >> >> Perhaps need to introduce some generic API to "allocate memory best >> suited for DMA to particular device", and fix allocation points (in >> drivers, filesystems, etc) to use it. Such an API could try to allocate >> area that can be DMAed by hardware, and fallback to other memory that >> can be used via swiotlb or other bounce buffer implementation. > > The DMA mapping API is meant to do this, but we can definitely improve > it or clarify some of the rules. DMA mapping API can't help here, it's about mapping, not about allocation. What I
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Friday, January 6, 2017 4:47:59 PM CET Nikita Yushchenko wrote: > >>> Just a guess, but if the inbound translation windows in the host > >>> bridge are wider than 32-bit, the reason for setting up a single > >>> 32-bit window is probably because that is what the parent bus supports. > > I've re-checked rcar-pcie hardware documentation. > > It indeed mentions that AXI bus it sits on is 32-bit. > > > >> Well anyway applying patch similar to your's will fix pcie-rcar + nvme > >> case - thus I don't object :) But it can break other cases ... > >> > >> But why do you hook at set_dma_mask() and overwrite mask inside, instead > >> of hooking at dma_supported() and rejecting unsupported mask? > >> > >> I think later is better, because it lets drivers to handle unsupported > >> high-dma case, like documented in DMA-API_HOWTO. > > > > I think the behavior I put in there is required for swiotlb to make > > sense, otherwise you would rely on the driver to handle dma_set_mask() > > failure gracefully with its own bounce buffers (as network and > > scsi drivers do but others don't). > > > > Having swiotlb or iommu enabled should result in dma_set_mask() always > > succeeding unless the mask is too small to cover the swiotlb > > bounce buffer area or the iommu virtual address space. This behavior > > is particularly important in case the bus address space is narrower > > than 32-bit, as we have to guarantee that the fallback to 32-bit > > DMA always succeeds. There are also a lot of drivers that try to > > set a 64-bit mask but don't implement bounce buffers for streaming > > mappings if that fails, and swiotlb is what we use to make those > > drivers work. > > > > And yes, the API is a horrible mess. > > With my patch applied and thus 32bit dma_mask set for NVMe device, I do > see high addresses passed to dma_map_*() routines and handled by > swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask() > operation but silently replace mask behind the scene" is required for > swiotlb to be used, does not match reality. See my point about drivers that don't implement bounce buffering. Apparently NVMe is one of them, unlike the SATA/SCSI/MMC storage drivers that do their own thing. The problem again is the inconsistency of the API. > It can be interpreted as a breakage elsewhere, but it's hard to point > particular "root cause". The entire infrastructure to allocate and use > DMA memory is messy. Absolutely. What I think happened here in chronological order is: - In the old days, 64-bit architectures tended to use an IOMMU all the time to work around 32-bit limitations on DMA masters - Some architectures had no IOMMU that fully solved this and the dma-mapping API required drivers to set the right mask and check the return code. If this failed, the driver needed to use its own bounce buffers as network and scsi do. See also the grossly misnamed "PCI_DMA_BUS_IS_PHYS" macro. - As we never had support for bounce buffers in all drivers, and early 64-bit Intel machines had no IOMMU, the swiotlb code was introduced as a workaround, so we can use the IOMMU case without driver specific bounce buffers everywhere - As most of the important 64-bit architectures (x86, arm64, powerpc) now always have either IOMMU or swiotlb enabled, drivers like NVMe started relying on it, and no longer handle a dma_set_mask failure properly. We may need to audit how drivers typically handle dma_set_mask() failure. The NVMe driver in its current state will probably cause silent data corruption when used on a 64-bit architecture that has a 32-bit bus but neither swiotlb nor iommu enabled at runtime. I would argue that the driver should be fixed to either refuse working in that configuration to avoid data corruption, or that it should implement bounce buffering like SCSI does. If we make it simply not work, then your suggestion of making dma_set_mask() fail will break your system in a different way. > Still current code does not work, thus fix is needed. > > Perhaps need to introduce some generic API to "allocate memory best > suited for DMA to particular device", and fix allocation points (in > drivers, filesystems, etc) to use it. Such an API could try to allocate > area that can be DMAed by hardware, and fallback to other memory that > can be used via swiotlb or other bounce buffer implementation. The DMA mapping API is meant to do this, but we can definitely improve it or clarify some of the rules. > But for now, have to stay with dma masks. Will follow-up with a patch > based on your but with coherent mask handling added. Ok. Arnd
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
>>> Just a guess, but if the inbound translation windows in the host >>> bridge are wider than 32-bit, the reason for setting up a single >>> 32-bit window is probably because that is what the parent bus supports. I've re-checked rcar-pcie hardware documentation. It indeed mentions that AXI bus it sits on is 32-bit. >> Well anyway applying patch similar to your's will fix pcie-rcar + nvme >> case - thus I don't object :) But it can break other cases ... >> >> But why do you hook at set_dma_mask() and overwrite mask inside, instead >> of hooking at dma_supported() and rejecting unsupported mask? >> >> I think later is better, because it lets drivers to handle unsupported >> high-dma case, like documented in DMA-API_HOWTO. > > I think the behavior I put in there is required for swiotlb to make > sense, otherwise you would rely on the driver to handle dma_set_mask() > failure gracefully with its own bounce buffers (as network and > scsi drivers do but others don't). > > Having swiotlb or iommu enabled should result in dma_set_mask() always > succeeding unless the mask is too small to cover the swiotlb > bounce buffer area or the iommu virtual address space. This behavior > is particularly important in case the bus address space is narrower > than 32-bit, as we have to guarantee that the fallback to 32-bit > DMA always succeeds. There are also a lot of drivers that try to > set a 64-bit mask but don't implement bounce buffers for streaming > mappings if that fails, and swiotlb is what we use to make those > drivers work. > > And yes, the API is a horrible mess. With my patch applied and thus 32bit dma_mask set for NVMe device, I do see high addresses passed to dma_map_*() routines and handled by swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask() operation but silently replace mask behind the scene" is required for swiotlb to be used, does not match reality. It can be interpreted as a breakage elsewhere, but it's hard to point particular "root cause". The entire infrastructure to allocate and use DMA memory is messy. Still current code does not work, thus fix is needed. Perhaps need to introduce some generic API to "allocate memory best suited for DMA to particular device", and fix allocation points (in drivers, filesystems, etc) to use it. Such an API could try to allocate area that can be DMAed by hardware, and fallback to other memory that can be used via swiotlb or other bounce buffer implementation. But for now, have to stay with dma masks. Will follow-up with a patch based on your but with coherent mask handling added. Nikita
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Wednesday, January 4, 2017 6:29:39 PM CET Nikita Yushchenko wrote: > > Just a guess, but if the inbound translation windows in the host > > bridge are wider than 32-bit, the reason for setting up a single > > 32-bit window is probably because that is what the parent bus supports. > > Well anyway applying patch similar to your's will fix pcie-rcar + nvme > case - thus I don't object :) But it can break other cases ... > > But why do you hook at set_dma_mask() and overwrite mask inside, instead > of hooking at dma_supported() and rejecting unsupported mask? > > I think later is better, because it lets drivers to handle unsupported > high-dma case, like documented in DMA-API_HOWTO. I think the behavior I put in there is required for swiotlb to make sense, otherwise you would rely on the driver to handle dma_set_mask() failure gracefully with its own bounce buffers (as network and scsi drivers do but others don't). Having swiotlb or iommu enabled should result in dma_set_mask() always succeeding unless the mask is too small to cover the swiotlb bounce buffer area or the iommu virtual address space. This behavior is particularly important in case the bus address space is narrower than 32-bit, as we have to guarantee that the fallback to 32-bit DMA always succeeds. There are also a lot of drivers that try to set a 64-bit mask but don't implement bounce buffers for streaming mappings if that fails, and swiotlb is what we use to make those drivers work. And yes, the API is a horrible mess. Arnd
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
For OF platforms, this is called via of_dma_configure(), that checks dma-ranges of node that is *parent* for host bridge. Host bridge currently does not control this at all. >>> >>> We need to think about this a bit. Is it actually the PCI host >>> bridge that limits the ranges here, or the bus that it is connected >>> to. In the latter case, the caller needs to be adapted to handle >>> both. >> >> In r-car case, I'm not sure what is the source of limitation at physical >> level. >> >> pcie-rcar driver configures ranges for PCIe inbound transactions based >> on dma-ranges property in it's device tree node. In the current device >> tree for this platform, that only contains one range and it is in lower >> memory. >> >> NVMe driver tries i/o to kmalloc()ed area. That returns 0x5 >> addresses here. As a quick experiment, I tried to add second range to >> pcie-rcar's dma-ranges to cover 0x5 area - but that did not make >> DMA to high addresses working. >> >> My current understanding is that host bridge hardware module can't >> handle inbound transactions to PCI addresses above 4G - and this >> limitations comes from host bridge itself. >> >> I've read somewhere in the lists that pcie-rcar hardware is "32-bit" - >> but I don't remember where, and don't know lowlevel details. Maybe >> somebody from linux-renesas can elaborate? > > Just a guess, but if the inbound translation windows in the host > bridge are wider than 32-bit, the reason for setting up a single > 32-bit window is probably because that is what the parent bus supports. Well anyway applying patch similar to your's will fix pcie-rcar + nvme case - thus I don't object :) But it can break other cases ... But why do you hook at set_dma_mask() and overwrite mask inside, instead of hooking at dma_supported() and rejecting unsupported mask? I think later is better, because it lets drivers to handle unsupported high-dma case, like documented in DMA-API_HOWTO.
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Wednesday, January 4, 2017 5:30:19 PM CET Nikita Yushchenko wrote: > >> For OF platforms, this is called via of_dma_configure(), that checks > >> dma-ranges of node that is *parent* for host bridge. Host bridge > >> currently does not control this at all. > > > > We need to think about this a bit. Is it actually the PCI host > > bridge that limits the ranges here, or the bus that it is connected > > to. In the latter case, the caller needs to be adapted to handle > > both. > > In r-car case, I'm not sure what is the source of limitation at physical > level. > > pcie-rcar driver configures ranges for PCIe inbound transactions based > on dma-ranges property in it's device tree node. In the current device > tree for this platform, that only contains one range and it is in lower > memory. > > NVMe driver tries i/o to kmalloc()ed area. That returns 0x5 > addresses here. As a quick experiment, I tried to add second range to > pcie-rcar's dma-ranges to cover 0x5 area - but that did not make > DMA to high addresses working. > > My current understanding is that host bridge hardware module can't > handle inbound transactions to PCI addresses above 4G - and this > limitations comes from host bridge itself. > > I've read somewhere in the lists that pcie-rcar hardware is "32-bit" - > but I don't remember where, and don't know lowlevel details. Maybe > somebody from linux-renesas can elaborate? Just a guess, but if the inbound translation windows in the host bridge are wider than 32-bit, the reason for setting up a single 32-bit window is probably because that is what the parent bus supports. > >> In current device trees no dma-ranges is defined for nodes that are > >> parents to pci host bridges. This will make of_dma_configure() to fall > >> back to 32-bit size for all devices on all current platforms. Thus > >> applying this patch will immediately break 64-bit dma masks on all > >> hardware that supports it. > > > > No, it won't break it, it will just fall back to swiotlb for all the > > ones that are lacking the dma-ranges property. I think this is correct > > behavior. > > I'd say - for all ones that have parents without dma-ranges property. > > As of 4.10-rc2, I see only two definitions of wide parent dma-ranges > under arch/arm64/boot/dts/ - in amd/amd-seattle-soc.dtsi and > apm/apm-storm.dtsi > > Are these the only arm64 platforms that can to DMA to high addresses? > I'm not arm64 expert but I'd be surprised if that's the case. It's likely that a few others also do high DMA, but a lot of arm64 chips are actually derived from earlier 32-bit chips and don't even support any RAM above 4GB, as well as having a lot of 32-bit DMA masters. > >> Also related: dma-ranges property used by several pci host bridges is > >> *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() - > >> former uses additional flags word at beginning. > > > > Can you elaborate? Do we have PCI host bridges that use wrongly formatted > > dma-ranges properties? > > of_dma_get_range() expects format. > > pcie-rcar.c, pci-rcar-gen2.c, pci-xgene.c and pcie-iproc.c from > drivers/pci/host/ all parse dma-ranges using of_pci_range_parser that > uses format - i.e. something different > from what of_dma_get_range() uses. The "dma_addr" here is expressed in terms of #address-cells of the bus it is in, and that is "3" in case of PCI, where the first 32-bit word is a bit pattern containing various things, and the other two cells are a 64-bit address. I think this is correct, but we may need to add some special handling for parsing PCI host bridges in of_dma_get_range, to ensure we actually look at translations for the memory space. Arnd
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
>> For OF platforms, this is called via of_dma_configure(), that checks >> dma-ranges of node that is *parent* for host bridge. Host bridge >> currently does not control this at all. > > We need to think about this a bit. Is it actually the PCI host > bridge that limits the ranges here, or the bus that it is connected > to. In the latter case, the caller needs to be adapted to handle > both. In r-car case, I'm not sure what is the source of limitation at physical level. pcie-rcar driver configures ranges for PCIe inbound transactions based on dma-ranges property in it's device tree node. In the current device tree for this platform, that only contains one range and it is in lower memory. NVMe driver tries i/o to kmalloc()ed area. That returns 0x5 addresses here. As a quick experiment, I tried to add second range to pcie-rcar's dma-ranges to cover 0x5 area - but that did not make DMA to high addresses working. My current understanding is that host bridge hardware module can't handle inbound transactions to PCI addresses above 4G - and this limitations comes from host bridge itself. I've read somewhere in the lists that pcie-rcar hardware is "32-bit" - but I don't remember where, and don't know lowlevel details. Maybe somebody from linux-renesas can elaborate? >> In current device trees no dma-ranges is defined for nodes that are >> parents to pci host bridges. This will make of_dma_configure() to fall >> back to 32-bit size for all devices on all current platforms. Thus >> applying this patch will immediately break 64-bit dma masks on all >> hardware that supports it. > > No, it won't break it, it will just fall back to swiotlb for all the > ones that are lacking the dma-ranges property. I think this is correct > behavior. I'd say - for all ones that have parents without dma-ranges property. As of 4.10-rc2, I see only two definitions of wide parent dma-ranges under arch/arm64/boot/dts/ - in amd/amd-seattle-soc.dtsi and apm/apm-storm.dtsi Are these the only arm64 platforms that can to DMA to high addresses? I'm not arm64 expert but I'd be surprised if that's the case. >> Also related: dma-ranges property used by several pci host bridges is >> *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() - >> former uses additional flags word at beginning. > > Can you elaborate? Do we have PCI host bridges that use wrongly formatted > dma-ranges properties? of_dma_get_range() expects format. pcie-rcar.c, pci-rcar-gen2.c, pci-xgene.c and pcie-iproc.c from drivers/pci/host/ all parse dma-ranges using of_pci_range_parser that uses format - i.e. something different from what of_dma_get_range() uses. Nikita
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Wednesday, January 4, 2017 9:24:09 AM CET Nikita Yushchenko wrote: > > commit 9a57d58d116800a535510053136c6dd7a9c26e25 > > Author: Arnd Bergmann> > Date: Tue Nov 17 14:06:55 2015 +0100 > > > > [EXPERIMENTAL] ARM64: check implement dma_set_mask > > > > Needs work for coherent mask > > > > Signed-off-by: Arnd Bergmann > > Unfortunately this is far incomplete > > > @@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 > > dma_base, u64 size, > > if (!dev->archdata.dma_ops) > > dev->archdata.dma_ops = _dma_ops; > > > > + /* > > +* we don't yet support buses that have a non-zero mapping. > > +* Let's hope we won't need it > > +*/ > > + WARN_ON(dma_base != 0); > > + > > + /* > > +* Whatever the parent bus can set. A device must not set > > +* a DMA mask larger than this. > > +*/ > > + dev->archdata.parent_dma_mask = size; > > + > > ... because size/mask passed here for PCI devices are meaningless. > > For OF platforms, this is called via of_dma_configure(), that checks > dma-ranges of node that is *parent* for host bridge. Host bridge > currently does not control this at all. We need to think about this a bit. Is it actually the PCI host bridge that limits the ranges here, or the bus that it is connected to. In the latter case, the caller needs to be adapted to handle both. > In current device trees no dma-ranges is defined for nodes that are > parents to pci host bridges. This will make of_dma_configure() to fall > back to 32-bit size for all devices on all current platforms. Thus > applying this patch will immediately break 64-bit dma masks on all > hardware that supports it. No, it won't break it, it will just fall back to swiotlb for all the ones that are lacking the dma-ranges property. I think this is correct behavior. > Also related: dma-ranges property used by several pci host bridges is > *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() - > former uses additional flags word at beginning. Can you elaborate? Do we have PCI host bridges that use wrongly formatted dma-ranges properties? Arnd
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
> commit 9a57d58d116800a535510053136c6dd7a9c26e25 > Author: Arnd Bergmann> Date: Tue Nov 17 14:06:55 2015 +0100 > > [EXPERIMENTAL] ARM64: check implement dma_set_mask > > Needs work for coherent mask > > Signed-off-by: Arnd Bergmann Unfortunately this is far incomplete > @@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 > dma_base, u64 size, > if (!dev->archdata.dma_ops) > dev->archdata.dma_ops = _dma_ops; > > + /* > + * we don't yet support buses that have a non-zero mapping. > + * Let's hope we won't need it > + */ > + WARN_ON(dma_base != 0); > + > + /* > + * Whatever the parent bus can set. A device must not set > + * a DMA mask larger than this. > + */ > + dev->archdata.parent_dma_mask = size; > + ... because size/mask passed here for PCI devices are meaningless. For OF platforms, this is called via of_dma_configure(), that checks dma-ranges of node that is *parent* for host bridge. Host bridge currently does not control this at all. In current device trees no dma-ranges is defined for nodes that are parents to pci host bridges. This will make of_dma_configure() to fall back to 32-bit size for all devices on all current platforms. Thus applying this patch will immediately break 64-bit dma masks on all hardware that supports it. Also related: dma-ranges property used by several pci host bridges is *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() - former uses additional flags word at beginning.
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Tuesday, January 3, 2017 6:44:44 PM CET Will Deacon wrote: > > @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, > > struct sg_table *sgt, > > > > static int __swiotlb_dma_supported(struct device *hwdev, u64 mask) > > { > > +#ifdef CONFIG_PCI > > + if (dev_is_pci(hwdev)) { > > + struct pci_dev *pdev = to_pci_dev(hwdev); > > + struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus); > > + > > + if (br->dev.dma_mask && (*br->dev.dma_mask) && > > + (mask & (*br->dev.dma_mask)) != mask) > > + return 0; > > + } > > +#endif > > Hmm, but this makes it look like the problem is both arm64 and swiotlb > specific, when in reality it's not. Perhaps another hack you could try > would be to register a PCI bus notifier in the host bridge looking for > BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child > device before the driver has probed, but adding a dma_set_mask callback > to limit the mask to what you need? > > I agree that it would be better if dma_set_mask handled all of this > transparently, but it's all based on the underlying ops rather than the > bus type. This is what I prototyped a long time ago when this first came up. I still think this needs to be solved properly for all of arm64, not with a PCI specific hack, and in particular not using notifiers. Arnd commit 9a57d58d116800a535510053136c6dd7a9c26e25 Author: Arnd BergmannDate: Tue Nov 17 14:06:55 2015 +0100 [EXPERIMENTAL] ARM64: check implement dma_set_mask Needs work for coherent mask Signed-off-by: Arnd Bergmann diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h index 243ef256b8c9..a57e7bb10e71 100644 --- a/arch/arm64/include/asm/device.h +++ b/arch/arm64/include/asm/device.h @@ -22,6 +22,7 @@ struct dev_archdata { void *iommu;/* private IOMMU data */ #endif bool dma_coherent; + u64 parent_dma_mask; }; struct pdev_archdata { diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 290a84f3351f..aa65875c611b 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -352,6 +352,31 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask) return 1; } +static int __swiotlb_set_dma_mask(struct device *dev, u64 mask) +{ + /* device is not DMA capable */ + if (!dev->dma_mask) + return -EIO; + + /* mask is below swiotlb bounce buffer, so fail */ + if (!swiotlb_dma_supported(dev, mask)) + return -EIO; + + /* +* because of the swiotlb, we can return success for +* larger masks, but need to ensure that bounce buffers +* are used above parent_dma_mask, so set that as +* the effective mask. +*/ + if (mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + + *dev->dma_mask = mask; + + return 0; +} + static struct dma_map_ops swiotlb_dma_ops = { .alloc = __dma_alloc, .free = __dma_free, @@ -367,6 +392,7 @@ static struct dma_map_ops swiotlb_dma_ops = { .sync_sg_for_device = __swiotlb_sync_sg_for_device, .dma_supported = __swiotlb_dma_supported, .mapping_error = swiotlb_dma_mapping_error, + .set_dma_mask = __swiotlb_set_dma_mask, }; static int __init atomic_pool_init(void) @@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, if (!dev->archdata.dma_ops) dev->archdata.dma_ops = _dma_ops; + /* +* we don't yet support buses that have a non-zero mapping. +* Let's hope we won't need it +*/ + WARN_ON(dma_base != 0); + + /* +* Whatever the parent bus can set. A device must not set +* a DMA mask larger than this. +*/ + dev->archdata.parent_dma_mask = size; + dev->archdata.dma_coherent = coherent; __iommu_setup_dma_ops(dev, dma_base, size, iommu); }
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On 01/03/2017 01:01 PM, Nikita Yushchenko wrote: >>> It is possible that PCI device supports 64-bit DMA addressing, and thus >>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host >>> bridge has limitations on inbound transactions addressing. Example of >>> such setup is NVME SSD device connected to RCAR PCIe controller. >>> >>> Previously there was attempt to handle this via bus notifier: after >>> driver is attached to PCI device, bridge driver gets notifier callback, >>> and resets dma_mask from there. However, this is racy: PCI device driver >>> could already allocate buffers and/or start i/o in probe routine. >>> In NVME case, i/o is started in workqueue context, and this race gives >>> "sometimes works, sometimes not" effect. >>> >>> Proper solution should make driver's dma_set_mask() call to fail if host >>> bridge can't support mask being set. >>> >>> This patch makes __swiotlb_dma_supported() to check mask being set for >>> PCI device against dma_mask of struct device corresponding to PCI host >>> bridge (one with name "pci:YY"), if that dma_mask is set. >>> >>> This is the least destructive approach: currently dma_mask of that device >>> object is not used anyhow, thus all existing setups will work as before, >>> and modification is required only in actually affected components - >>> driver of particular PCI host bridge, and dma_map_ops of particular >>> platform. >>> >>> Signed-off-by: Nikita Yushchenko>>> --- >>> arch/arm64/mm/dma-mapping.c | 11 +++ >>> 1 file changed, 11 insertions(+) >>> >>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c >>> index 290a84f..49645277 100644 >>> --- a/arch/arm64/mm/dma-mapping.c >>> +++ b/arch/arm64/mm/dma-mapping.c >>> @@ -28,6 +28,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> #include >>> >>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, >>> struct sg_table *sgt, >>> >>> static int __swiotlb_dma_supported(struct device *hwdev, u64 mask) >>> { >>> +#ifdef CONFIG_PCI >>> + if (dev_is_pci(hwdev)) { >>> + struct pci_dev *pdev = to_pci_dev(hwdev); >>> + struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus); >>> + >>> + if (br->dev.dma_mask && (*br->dev.dma_mask) && >>> + (mask & (*br->dev.dma_mask)) != mask) >>> + return 0; >>> + } >>> +#endif >> >> Hmm, but this makes it look like the problem is both arm64 and swiotlb >> specific, when in reality it's not. Perhaps another hack you could try >> would be to register a PCI bus notifier in the host bridge looking for >> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child >> device before the driver has probed, but adding a dma_set_mask callback >> to limit the mask to what you need? > > This is what Renesas BSP tries to do and it does not work. > > BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but > i/o can be started before that. Hm. This is strange statement: really_probe |->driver_sysfs_add |-> blocking_notifier_call_chain(>bus->p->bus_notifier, BUS_NOTIFY_BIND_DRIVER, dev); ... |- ret = drv->probe(dev); ... |- driver_bound(dev); |- blocking_notifier_call_chain(>bus->p->bus_notifier, BUS_NOTIFY_BOUND_DRIVER, dev); Am I missing smth? -- regards, -grygorii
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
>> It is possible that PCI device supports 64-bit DMA addressing, and thus >> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host >> bridge has limitations on inbound transactions addressing. Example of >> such setup is NVME SSD device connected to RCAR PCIe controller. >> >> Previously there was attempt to handle this via bus notifier: after >> driver is attached to PCI device, bridge driver gets notifier callback, >> and resets dma_mask from there. However, this is racy: PCI device driver >> could already allocate buffers and/or start i/o in probe routine. >> In NVME case, i/o is started in workqueue context, and this race gives >> "sometimes works, sometimes not" effect. >> >> Proper solution should make driver's dma_set_mask() call to fail if host >> bridge can't support mask being set. >> >> This patch makes __swiotlb_dma_supported() to check mask being set for >> PCI device against dma_mask of struct device corresponding to PCI host >> bridge (one with name "pci:YY"), if that dma_mask is set. >> >> This is the least destructive approach: currently dma_mask of that device >> object is not used anyhow, thus all existing setups will work as before, >> and modification is required only in actually affected components - >> driver of particular PCI host bridge, and dma_map_ops of particular >> platform. >> >> Signed-off-by: Nikita Yushchenko>> --- >> arch/arm64/mm/dma-mapping.c | 11 +++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c >> index 290a84f..49645277 100644 >> --- a/arch/arm64/mm/dma-mapping.c >> +++ b/arch/arm64/mm/dma-mapping.c >> @@ -28,6 +28,7 @@ >> #include >> #include >> #include >> +#include >> >> #include >> >> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, >> struct sg_table *sgt, >> >> static int __swiotlb_dma_supported(struct device *hwdev, u64 mask) >> { >> +#ifdef CONFIG_PCI >> +if (dev_is_pci(hwdev)) { >> +struct pci_dev *pdev = to_pci_dev(hwdev); >> +struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus); >> + >> +if (br->dev.dma_mask && (*br->dev.dma_mask) && >> +(mask & (*br->dev.dma_mask)) != mask) >> +return 0; >> +} >> +#endif > > Hmm, but this makes it look like the problem is both arm64 and swiotlb > specific, when in reality it's not. Perhaps another hack you could try > would be to register a PCI bus notifier in the host bridge looking for > BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child > device before the driver has probed, but adding a dma_set_mask callback > to limit the mask to what you need? This is what Renesas BSP tries to do and it does not work. BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but i/o can be started before that.
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On Thu, Dec 29, 2016 at 11:45:03PM +0300, Nikita Yushchenko wrote: > It is possible that PCI device supports 64-bit DMA addressing, and thus > it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host > bridge has limitations on inbound transactions addressing. Example of > such setup is NVME SSD device connected to RCAR PCIe controller. > > Previously there was attempt to handle this via bus notifier: after > driver is attached to PCI device, bridge driver gets notifier callback, > and resets dma_mask from there. However, this is racy: PCI device driver > could already allocate buffers and/or start i/o in probe routine. > In NVME case, i/o is started in workqueue context, and this race gives > "sometimes works, sometimes not" effect. > > Proper solution should make driver's dma_set_mask() call to fail if host > bridge can't support mask being set. > > This patch makes __swiotlb_dma_supported() to check mask being set for > PCI device against dma_mask of struct device corresponding to PCI host > bridge (one with name "pci:YY"), if that dma_mask is set. > > This is the least destructive approach: currently dma_mask of that device > object is not used anyhow, thus all existing setups will work as before, > and modification is required only in actually affected components - > driver of particular PCI host bridge, and dma_map_ops of particular > platform. > > Signed-off-by: Nikita Yushchenko> --- > arch/arm64/mm/dma-mapping.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 290a84f..49645277 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > > #include > > @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, > struct sg_table *sgt, > > static int __swiotlb_dma_supported(struct device *hwdev, u64 mask) > { > +#ifdef CONFIG_PCI > + if (dev_is_pci(hwdev)) { > + struct pci_dev *pdev = to_pci_dev(hwdev); > + struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus); > + > + if (br->dev.dma_mask && (*br->dev.dma_mask) && > + (mask & (*br->dev.dma_mask)) != mask) > + return 0; > + } > +#endif Hmm, but this makes it look like the problem is both arm64 and swiotlb specific, when in reality it's not. Perhaps another hack you could try would be to register a PCI bus notifier in the host bridge looking for BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child device before the driver has probed, but adding a dma_set_mask callback to limit the mask to what you need? I agree that it would be better if dma_set_mask handled all of this transparently, but it's all based on the underlying ops rather than the bus type. Will
Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
On 12/30/2016 12:46 PM, Sergei Shtylyov wrote: It is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host Its. bridge has limitations on inbound transactions addressing. Example of such setup is NVME Isn't it called NVMe? SSD device connected to RCAR PCIe controller. R=Car. Sorry, R-Car. :-) [...] MBR, Sergei