Re: [PATCH] usb: dwc3: host: inherit dma configuration from parent dev
On Wed, Apr 27, 2016 at 04:11:17PM +0200, Arnd Bergmann wrote: > On Wednesday 27 April 2016 14:59:00 Catalin Marinas wrote: > > > > I would be in favour of a dma_inherit() function as well. We could hack > > something up in the arch code (like below) but I would rather prefer an > > explicit dma_inherit() call by drivers creating such devices. > > > > diff --git a/arch/arm64/include/asm/dma-mapping.h > > b/arch/arm64/include/asm/dma-mapping.h > > index ba437f090a74..ea6fb9b0e8fa 100644 > > --- a/arch/arm64/include/asm/dma-mapping.h > > +++ b/arch/arm64/include/asm/dma-mapping.h > > @@ -29,8 +29,11 @@ extern struct dma_map_ops dummy_dma_ops; > > > > static inline struct dma_map_ops *__generic_dma_ops(struct device *dev) > > { > > - if (dev && dev->archdata.dma_ops) > > - return dev->archdata.dma_ops; > > + while (dev) { > > + if (dev->archdata.dma_ops) > > + return dev->archdata.dma_ops; > > + dev = dev->parent; > > + } > > I think this would be a very bad idea: we don't want to have random > devices be able to perform DMA just because their parent devices > have been set up that way. I agree, it's a big hack. It would be nice to have a simpler way to do this in driver code rather than explicitly calling of_dma_configure/arch_setup_dma_ops as per the original patch in this thread. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: host: inherit dma configuration from parent dev
On Wed, Apr 27, 2016 at 08:41:06AM +0300, Felipe Balbi wrote: > Grygorii Strashkowrites: > > On 04/26/2016 09:17 AM, Felipe Balbi wrote: > >> Grygorii Strashko writes: > >>> Now not all DMA paremters configured properly for "xhci-hcd" platform > >>> device which is created manually. For example: dma_pfn_offset, dam_ops > >>> and iommu configuration will not corresponds "dwc3" devices > >>> configuration. As result, this will cause problems like wrong DMA > >>> addresses translation on platforms with LPAE enabled like Keystone 2. > >>> > >>> When platform is using DT boot mode the DMA configuration will be > >>> parsed and applied from DT, so, to fix this issue, reuse > >>> of_dma_configure() API and retrieve DMA configuartion for "xhci-hcd" > >>> from DWC3 device node. > >> > >> patch is incomplete. You left out non-DT users which might suffer from > >> the same problem. > > > > Honestly, I don't know how to fix it gracefully for non-DT case. > > I can update commit message to mention that this is fix for DT case only. > > no, that won't do :-) There are other users for this driver and they are > all "out-of-compliance" when it comes to DMA usage. Apparently, the > desired behavior is to pass correct device to DMA API which the gadget > side is already doing (see below). For the host side, the fix has to be > more involved. > > Frankly, I'd prefer that DMA setup could be inherited from parent > device, then it wouldn't really matter and a bunch of this could be > simplified. Some sort of dma_inherit(struct device *dev, struct device > *parent) would go a long way, IMHO. I would be in favour of a dma_inherit() function as well. We could hack something up in the arch code (like below) but I would rather prefer an explicit dma_inherit() call by drivers creating such devices. diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h index ba437f090a74..ea6fb9b0e8fa 100644 --- a/arch/arm64/include/asm/dma-mapping.h +++ b/arch/arm64/include/asm/dma-mapping.h @@ -29,8 +29,11 @@ extern struct dma_map_ops dummy_dma_ops; static inline struct dma_map_ops *__generic_dma_ops(struct device *dev) { - if (dev && dev->archdata.dma_ops) - return dev->archdata.dma_ops; + while (dev) { + if (dev->archdata.dma_ops) + return dev->archdata.dma_ops; + dev = dev->parent; + } /* * We expect no ISA devices, and all other DMA masters are expected to -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dwc3 initiated xhci probe problem in arm64 4.4 kernel due to DMA setup
On Fri, Apr 15, 2016 at 02:56:17PM +0300, Grygorii Strashko wrote: > From c68225e97e8c9505aca4ceab19a0d8e4dde31b73 Mon Sep 17 00:00:00 2001 > From: Grygorii Strashko> Date: Thu, 31 Mar 2016 19:40:52 +0300 > Subject: [PATCH] usb: dwc3: host: inherit dma configuration from parent dev > > Now not all DMA paremters configured properly for "xhci-hcd" platform > device which is created manually. For example: dma_pfn_offset, dam_ops > and iommu configuration will not corresponds "dwc3" devices > configuration. As result, this will cause problems like wrong DMA > addresses translation on platforms with LPAE enabled like Keystone 2. > > When platform is using DT boot mode the DMA configuration will be > parsed and applied from DT, so, to fix this issue, reuse > of_dma_configure() API and retrieve DMA configuartion for "xhci-hcd" > from DWC3 device node. > > Signed-off-by: Grygorii Strashko > --- > drivers/usb/dwc3/host.c | 15 ++- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c > index c679f63..93c8ef9 100644 > --- a/drivers/usb/dwc3/host.c > +++ b/drivers/usb/dwc3/host.c > @@ -17,6 +17,7 @@ > > #include > #include > +#include > > #include "core.h" > > @@ -32,12 +33,7 @@ int dwc3_host_init(struct dwc3 *dwc) > return -ENOMEM; > } > > - dma_set_coherent_mask(>dev, dwc->dev->coherent_dma_mask); > - > xhci->dev.parent= dwc->dev; > - xhci->dev.dma_mask = dwc->dev->dma_mask; > - xhci->dev.dma_parms = dwc->dev->dma_parms; > - > dwc->xhci = xhci; > > ret = platform_device_add_resources(xhci, dwc->xhci_resources, > @@ -62,6 +58,15 @@ int dwc3_host_init(struct dwc3 *dwc) > phy_create_lookup(dwc->usb3_generic_phy, "usb3-phy", > dev_name(>dev)); > > + if (IS_ENABLED(CONFIG_OF) && dwc->dev->of_node) { > + of_dma_configure(>dev, dwc->dev->of_node); > + } else { > + dma_set_coherent_mask(>dev, dwc->dev->coherent_dma_mask); > + > + xhci->dev.dma_mask = dwc->dev->dma_mask; > + xhci->dev.dma_parms = dwc->dev->dma_parms; > + } > + > ret = platform_device_add(xhci); This looks fine to me, though I wonder whether we can make this more generic so that we don't have to update each driver. Question for Arnd: would it make sense to add an of_dma_configure(dev, dev->parent->of_node) call to platform_device_add() _if_ the device being added does not have an of_node? Alternatively, this could be done in the arch code via bus notifiers (we used to have one on arm64 for cache coherency but I removed it in 2189064795dc ("arm64: Implement set_arch_dma_coherent_ops() to replace bus notifiers")). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dwc3 initiated xhci probe problem in arm64 4.4 kernel due to DMA setup
On Fri, Apr 15, 2016 at 01:30:01PM +0300, Felipe Balbi wrote: > Catalin Marinas <catalin.mari...@arm.com> writes: > > On Fri, Apr 15, 2016 at 11:01:08AM +0100, Catalin Marinas wrote: > >> On Fri, Apr 15, 2016 at 12:49:15PM +0300, Felipe Balbi wrote: > >> > Catalin Marinas <catalin.mari...@arm.com> writes: > >> > > On Thu, Apr 14, 2016 at 12:46:47PM +, David Fisher wrote: > >> > >> dwc3 is in dual-role, with "synopsys,dwc3" specified in DT. > >> > >> > >> > >> When xhci is probed, initiated from dwc3/host.c (not DT), we get : > >> > >> xhci-hcd: probe of xhci-hcd.7.auto failed with error -5 > >> > >> This -EIO error originated from inside dma_set_mask() down in > >> > >> include/asm-generic/dma-mapping-common.h > >> > >> > >> > >> If "generic-xhci" is specified in DT instead, it probes fine as a > >> > >> host-only dwc3 > >> > >> The difference between DT initiated probe and dwc3 initiated probe is > >> > >> that > >> > >> when DT initiated probe gets to dma_supported, dma_supported is > >> > >> implemented by swiotlb_dma_supported (previously set up by a DT call > >> > >> to arch_dma_setup_ops). > >> > >> Whereas when dwc3 initiated xhci probe gets to dma_supported, > >> > >> arch_dma_setup_ops has not been called > >> > >> and dma_supported is only implemented by __dummy_dma_supported, > >> > >> returning 0. > >> > >> > >> > >> Bisecting finds the "bad" commit as > >> > >> 1dccb598df549d892b6450c261da54cdd7af44b4 (inbetween 4.4-rc1 and > >> > >> 4.4-rc2) > >> > >> --- a/arch/arm64/include/asm/dma-mapping.h > >> > >> --- a/arch/arm64/mm/dma-mapping.c > >> > >> > >> > >> Previous to this commit, dma_ops = _dma_ops was done in > >> > >> arm64_dma_init > >> > >> After this commit, the assignment is only done in arch_dma_setup_ops. > >> > > > >> > > This restriction was added on purpose and the arm64 __generic_dma_ops() > >> > > now has a comment: > >> > > > >> > >/* > >> > > * We expect no ISA devices, and all other DMA masters are > >> > > expected to > >> > > * have someone call arch_setup_dma_ops at device creation time. > >> > > */ > >> > > >> > how ? > >> > >> Usually by calling arch_setup_dma_ops(). > > > > Or of_dma_configure(), I forgot to mention this (see the > > pci_dma_configure() function as an example). > > the device is manually created, there's not OF/DT for it ;-) As for PCI, the created device doesn't have a node but the bridge does. Something like below, completely untested: diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c index c679f63783ae..96d8babd7f23 100644 --- a/drivers/usb/dwc3/host.c +++ b/drivers/usb/dwc3/host.c @@ -32,7 +32,10 @@ int dwc3_host_init(struct dwc3 *dwc) return -ENOMEM; } - dma_set_coherent_mask(>dev, dwc->dev->coherent_dma_mask); + if (IS_ENABLED(CONFIG_OF) && dwc->dev->of_node) + of_dma_configure(>dev, dwc->dev->of_node); + else + dma_set_coherent_mask(>dev, dwc->dev->coherent_dma_mask); xhci->dev.parent= dwc->dev; xhci->dev.dma_mask = dwc->dev->dma_mask; -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dwc3 initiated xhci probe problem in arm64 4.4 kernel due to DMA setup
On Thu, Apr 14, 2016 at 12:46:47PM +, David Fisher wrote: > dwc3 is in dual-role, with "synopsys,dwc3" specified in DT. > > When xhci is probed, initiated from dwc3/host.c (not DT), we get : > xhci-hcd: probe of xhci-hcd.7.auto failed with error -5 > This -EIO error originated from inside dma_set_mask() down in > include/asm-generic/dma-mapping-common.h > > If "generic-xhci" is specified in DT instead, it probes fine as a host-only > dwc3 > The difference between DT initiated probe and dwc3 initiated probe is that > when DT initiated probe gets to dma_supported, dma_supported is > implemented by swiotlb_dma_supported (previously set up by a DT call to > arch_dma_setup_ops). > Whereas when dwc3 initiated xhci probe gets to dma_supported, > arch_dma_setup_ops has not been called > and dma_supported is only implemented by __dummy_dma_supported, returning 0. > > Bisecting finds the "bad" commit as > 1dccb598df549d892b6450c261da54cdd7af44b4 (inbetween 4.4-rc1 and 4.4-rc2) > --- a/arch/arm64/include/asm/dma-mapping.h > --- a/arch/arm64/mm/dma-mapping.c > > Previous to this commit, dma_ops = _dma_ops was done in arm64_dma_init > After this commit, the assignment is only done in arch_dma_setup_ops. This restriction was added on purpose and the arm64 __generic_dma_ops() now has a comment: /* * We expect no ISA devices, and all other DMA masters are expected to * have someone call arch_setup_dma_ops at device creation time. */ The commit above also describes why it is dangerous to assume a fallback to swiotlb ops. > We're not using any dwc3-.c wrapper here, but we've not needed it > before this commit. Relevant ? > It might also be possible that we've messed up some KConfig that changed > between versions ? > Or is this a bug that needs patching - in arm(64) dma or something different > for dma setup in dwc3/host.c ? What I don't understand is why of_dma_configure() is not called for a device with compatible="synopsys,dwc3" in DT (IIUC, you said that you specify this in DT and of_platform_populate should do the right configuration). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Honor ACPI _CCA attribute setting
On Fri, Aug 14, 2015 at 08:45:22AM +0700, Suravee Suthikulpanit wrote: On 8/13/15 04:51, Jeremy Linton wrote: ACPI configurations can now mark devices as noncoherent, support that choice. Signed-off-by: Jeremy Linton jeremy.lin...@arm.com --- include/acpi/acpi_bus.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 83061ca..7ecb8e4 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -399,7 +399,7 @@ static inline bool acpi_check_dma(struct acpi_device *adev, bool *coherent) * case 1. Do not support and disable DMA. * case 2. Support but rely on arch-specific cache maintenance for * non-coherence DMA operations. - * Currently, we implement case 1 above. + * Currently, we implement case 2 above. * * For the case when _CCA is missing (i.e. cca_seen=0) and * platform specifies ACPI_CCA_REQUIRED, we do not support DMA, @@ -407,7 +407,8 @@ static inline bool acpi_check_dma(struct acpi_device *adev, bool *coherent) * * See acpi_init_coherency() for more info. */ -if (adev-flags.coherent_dma) { +if (adev-flags.coherent_dma || +(adev-flags.cca_seen IS_ENABLED(CONFIG_ARM64))) { ret = true; if (coherent) *coherent = adev-flags.coherent_dma; This change was in my earlier revisions for the original patch series to add ACPI CCA support. At the time, this was pushed back since we were not sure whether this would be a useful case, and whether such hardware exists. Would it be useful to document somewhere (may be in the GIT commit message) about which hardware might need this? So far, it's the ARM Juno development board (the emphasis here is on being able to use it for development, not a production system). I think the commit log should also give you credit for the original implementation. Arnd/Catalin, any feedback on this? That's where it was left in the previous thread: https://lkml.org/lkml/2015/5/21/376 (and I'll refrain from further comments ;)) -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64
On Mon, May 19, 2014 at 09:21:17AM +0100, Jon Medhurst (Tixy) wrote: On Fri, 2014-05-16 at 18:40 +0100, Catalin Marinas wrote: On Fri, May 16, 2014 at 06:08:45PM +0100, Jon Medhurst (Tixy) wrote: On Fri, 2014-05-16 at 13:55 +0100, Catalin Marinas wrote: [...] It could if arm64 would restrict the DMA addresses to 32-bit, but it doesn't and I end up on my platform with USB DMA buffers allocated 4GB address. dma_alloc_coherent() on arm64 should return 32-bit addresses if the coherent_dma_mask is set to 32-bit. Not if you have CONFIG_DMA_CMA. Unless I have misread the code, enabling CMA means memory comes from a common pool carved out at boot with no way for drivers to specify it's restrictions [1]. It's what I've spent most of the week trying to work around in a clean way, and have finally given up. The easiest hack would be to pass a limit dma_contiguous_reserve() in arm64_memblock_init(), something like this: That is by far the easiest but I dismissed it because it affects all platforms built from that source tree; and if the hack were extended to include a kernel config option, that still may not work on a single kernel binary expected to work on multiple platforms. Basically, I've tried various was to do it 'properly' and after failing am resorting to truly hideous hacks to the (out-of-tree driver) code - so at least other platforms won't be impacted. Can you set a specific reserved memory region in the DT to be used by CMA (via linux,cma-default), or it's just for the default size? On arm64 we enabled CONFIG_DMA_CMA by default but compared to swiotlb it doesn't honour GFP_DMA. The arm32 port sets the CMA limit to arm_dma_limit which is 32-bit or a SoC-define one. So I'm tempted to default to 32-bit as well if it can be overridden via DT. -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64
On Tue, May 20, 2014 at 08:37:10PM +0100, Arnd Bergmann wrote: On Tuesday 20 May 2014 15:12:49 Catalin Marinas wrote: On Tue, May 20, 2014 at 12:08:58PM +0100, Arnd Bergmann wrote: On Tuesday 20 May 2014 12:02:46 Catalin Marinas wrote: On Mon, May 19, 2014 at 05:55:56PM +0100, Arnd Bergmann wrote: On Monday 19 May 2014 16:56:08 Catalin Marinas wrote: On Mon, May 19, 2014 at 10:44:51AM +0100, Arnd Bergmann wrote: On Monday 19 May 2014 10:03:40 Catalin Marinas wrote: We probably want to default to 32-bit for arm32 in the absence of dma-ranges. For arm64, I'd prefer if we could always mandate dma-ranges to be present for each bus, just like we mandate ranges to be present. I hope it's not too late for that. dma_set_mask should definitely look at the dma-ranges properties, and the helper that Santosh just introduced should give us all the information we need. We just need to decide on the correct behavior. Last time I looked at Santosh's patches I thought the dma-ranges is per device rather than per bus. We could make it per bus only and let the device call dma_set_mask() explicitly if it wants to restrict it further. Can you check again? I've read the code again yesterday to check this, and I concluded it was correctly doing this per bus. You are right, I missed the fact that of_dma_get_range() checks the dma-ranges property in the node's parent. So what we need is setting the default dma mask based on the size information in dma-ranges to something like this: mask = rounddown_pow_of_two(size) - 1; dma_set_mask(dev, mask);/* or dma_set_mask_and_coherent() */ I don't think we should be calling dma_set_mask here, since that would just go and parse the same property again once we fix the implementation. Since this is a low-level helper, we can probably just assign the dma mask directly. I was thinking of calling it in of_dma_configure() as that's where we already set the default dma_mask and coherent_dma_mask. Default to 32-bit if no dma-ranges are present. While we currently don't have a set of swiotlb DMA ops on ARM32, we do have it on ARM64, and I think we should be using them properly. It should really not be hard to implement a proper dma_set_mask() function for ARM64 that gets is able to set up the swiotlb based on the dma-ranges properties and always returns success but leaves the mask unchanged. The swiotlb bounce buffer needs to be pre-allocated at boot, otherwise we don't have any guarantees. Since we can't honour random masks anyway, we stick to ZONE_DMA which is currently in the 4G limit. But the driver calls dma_set_mask() too late for any further swiotlb setup. With IOMMU we can be more flexible around dma_set_mask(), can be done at run-time. What we can do with swiotlb is to check if the mask is smaller than ZONE_DMA. If it ever is, we have to fail dma_set_mask and hope the driver can fall back to PIO mode or it will have to fail its probe() function. dma_set_(coherent_)mask check swiotlb_dma_supported() which returns false if io_tlb_end goes beyond the device mask. So we just need to ensure that io_tlb is allocated within ZONE_DMA. Makes sense for dma_set_mask. Why do you do the same thing for coherent_mask? Shouldn't that check against ZONE_DMA instead? It depends on the meaning of coherent_dma_mask. As I understand it, that's the dma mask use for dma_alloc_coherent() to return a memory buffer that the device is able to access. I don't see it much different from the dma_mask used by the streaming API. I guess some hardware could have different masks here if they have cache coherency only for certain memory ranges (and the coherent_dma_mask would be smaller than dma_mask). The point I was trying to make is a different one: For the streaming mapping, you can allow any mask as long as the swiotlb is able to create a bounce buffer. dma_mask is a hardware parameter and is used by the streaming DMA API implementation (which could be swiotlb) to decide whether to bounce or not. The streaming DMA API can take any CPU address as input, independent of the dma_mask, and bounce accordingly. If it doesn't have a bounce buffer matching the dma_mask, dma_supported() would return false. This does not work for the coherent mapping, because that by definition cannot use bounce buffers. Yes but most of the time these masks have the same value. For dma_set_coherent_mask(), we also have to fail any call that tries to set a mask larger than what the device hardware can do. Unlike that, dma_set_mask() can succeed with any mask, we just have to enable swiotlb if the mask that the driver
Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64
On Wed, May 21, 2014 at 03:43:39PM +0100, Arnd Bergmann wrote: On Wednesday 21 May 2014 14:56:36 Catalin Marinas wrote: On Tue, May 20, 2014 at 08:37:10PM +0100, Arnd Bergmann wrote: On Tuesday 20 May 2014 15:12:49 Catalin Marinas wrote: So what we need is setting the default dma mask based on the size information in dma-ranges to something like this: mask = rounddown_pow_of_two(size) - 1; dma_set_mask(dev, mask);/* or dma_set_mask_and_coherent() */ I don't think we should be calling dma_set_mask here, since that would just go and parse the same property again once we fix the implementation. Since this is a low-level helper, we can probably just assign the dma mask directly. I was thinking of calling it in of_dma_configure() as that's where we already set the default dma_mask and coherent_dma_mask. Default to 32-bit if no dma-ranges are present. Right. Actually it should also be capped to 32-bit, to allow compatibility with drivers that don't call dma_set_mask and can't do 64-bit DMA. This is the normal behavior for PCI drivers. They need to set a 64-bit mask and check the result. OK. The point I was trying to make is a different one: For the streaming mapping, you can allow any mask as long as the swiotlb is able to create a bounce buffer. dma_mask is a hardware parameter and is used by the streaming DMA API implementation (which could be swiotlb) to decide whether to bounce or not. The streaming DMA API can take any CPU address as input, independent of the dma_mask, and bounce accordingly. If it doesn't have a bounce buffer matching the dma_mask, dma_supported() would return false. This does not work for the coherent mapping, because that by definition cannot use bounce buffers. Yes but most of the time these masks have the same value. Let me start again with an example: io_tlb_end=0x100; // 16 MB dma_mask = 0x1000; // 256 MB ZONE_DMA = 0x1; // 4 GB max_pfn = 0x10; // 64 GB The device driver has set dma_mask and dma_coherent_mask to 256 because of a limitation in the hardware. This succeeded because io_tlb_end is well within the dma mask. Since the coherent_dma_mask is smaller than max_pfn, the swiotlb version of dma_alloc_coherent then allocates the buffer using GFP_DMA. However, that likely returns an address above dma_mask/coherent_dma_mask. swiotlb_alloc_coherent() first tries __get_free_pages(GFP_DMA). If the returned address is not within coherent_dma_mask, it frees the allocated pages and tries to allocate from its bounce buffer (io_tlb). My point was that to fix this case, you must not compare the coherent_dma_mask requested by the device against io_tlb_end but against ZONE_DMA. See above, I think swiotlb does the right thing. We have a problem, however, with CMA, as we assume that the returned address is within coherent_dma_mask (we need a contiguous_dma_supported function, similar to swiotlb_dma_supported). We also don't limit the CMA buffer to ZONE_DMA on arm64. As far as I understand it, you are not allowed to create a noncacheable mapping for a page that also has a cachable mapping, or did that change between armv7 and armv8? No change between ARMv7 and ARMv8. In fact, ARMv7 has clarified what unpredictable means when you mix different attributes. Basically, we can mix the same memory type (Normal Cacheable vs Normal NonCacheable) but if the cacheability is different, we need to do some cache maintenance before accessing the non-cacheable mapping the first time (clear potentially dirty lines that could be evicted randomly). Any speculatively loaded cache lines via the cacheable mapping would not be hit when accessed via the non-cacheable one. Ah, only for the first access? I had remembered this differently, thanks for the clarification. The first time, but assuming that you no longer write to the cacheable alias to dirty it again (basically it's like the streaming DMA, if you want to access the cacheable alias you need cache maintenance). For CMA, we use the trick to change the mapping in place, but that doesn't work for normal pages, which are mapped using huge tlbs. That's true for arm32 and we could do the same for arm64, though I don't think its worth because we could break very huge mappings (e.g. 1GB). We have plenty of VA space and we just create a new non-coherent mapping (this latter part could be optimised, even generically, to use huge pages where possible). I thought we specifically did this on arm32 to avoid the duplicate mapping because we had concluded that it would be broken even with the updated ARMv7 definition. We did it because it was vaguely defined as unpredictable in the ARM ARM for a long time until the hw guys realised it's not easy to change Linux and decided to clarify what could actually
Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64
On Wed, May 21, 2014 at 05:15:06PM +0100, Rob Herring wrote: On Wed, May 21, 2014 at 10:48 AM, Arnd Bergmann a...@arndb.de wrote: On Wednesday 21 May 2014 10:26:01 Rob Herring wrote: What are you checking against to cause a failure and what do you do on failure? I'm guessing that PCI masks are compared to the mask of parent bridges and PCI devices just set the mask to 32-bit if 64-bit fails. That doesn't work if your mask needs to be somewhere between 32 and 64-bit due to some bus constraints. Perhaps that's not something we need to worry about until we see hardware with that condition. We should compare against the size returned by of_dma_get_range(). If the mask requested by the driver is larger than the mask of the bus it's attached on, dma_set_mask should fail. We can always allow 64-bit masks if the actual bus capability is enough to cover all the installed RAM. That is a relatively common case. Agreed. However, if we check dma-ranges, it may be large enough for all of RAM, but less than a full 64-bit. There is also the edge case that you cannot set the size to 2^64, but only 2^64 - 1. That means dma_set_mask(2^64 - 1) will always fail. Size of 0 meaning all range? -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64
On Mon, May 19, 2014 at 05:55:56PM +0100, Arnd Bergmann wrote: On Monday 19 May 2014 16:56:08 Catalin Marinas wrote: On Mon, May 19, 2014 at 10:44:51AM +0100, Arnd Bergmann wrote: On Monday 19 May 2014 10:03:40 Catalin Marinas wrote: On Mon, May 19, 2014 at 09:32:43AM +0100, Arnd Bergmann wrote: The more important question is what happens to high buffers allocated elsewhere that get passed into dma_map_sg by a device driver. Depending on the DT properties of the device and its parents, this needs to do one of three things: a) translate the 64-bit virtual address into a 64-bit bus address b) create an IOMMU entry for the 64-bit address and pass the 32-bit IOMMU address to the driver c) use the swiotlb code to create a bounce buffer at a 32-bit DMA address and copy the data around It's definitely wrong to just hardcode a DMA mask in the driver because that code doesn't know which of the three cases is being used. Moreover, you can't do it using an #ifdef CONFIG_ARM64, because it's completely independent of the architecture, and we need to do the exact same logic on ARM32 and any other architecture. I agree. The problem we currently have is system topology description to pass the DMA mask and in a hierarchical way. I can see Santosh's patches introducing dma-ranges but the coherent dma mask still set as 32-bit. We can use the dma-ranges to infer a mask but that's only specific to the device and the driver doesn't know whether it goes through an iommu or not. We definitely have to fix this very quickly, before people start building real arm64 systems and shipping them. We should not merge any hacks that attempt to work around the problem, but try to come to a conclusion how to handle them properly. My hope was that we could just always set the dma mask to whatever the DT says it should be to keep the burden from device drivers, unless they want to restrict it further (e.g. when the specific peripheral hardware has a bug that prevents us from using high addresses, even though the SoC in theory supports it everywhere). I agree. Rob Herring argued that we should always mimic PCI and call dma_set_mask() in drivers but default to a 32-bit mask otherwise, independent of whether the hardware can do more or less than that, IIRC. Can we not default to something built up from dma-ranges? Or 32-bit if dma-ranges property is missing? We probably want to default to 32-bit for arm32 in the absence of dma-ranges. For arm64, I'd prefer if we could always mandate dma-ranges to be present for each bus, just like we mandate ranges to be present. I hope it's not too late for that. dma_set_mask should definitely look at the dma-ranges properties, and the helper that Santosh just introduced should give us all the information we need. We just need to decide on the correct behavior. Last time I looked at Santosh's patches I thought the dma-ranges is per device rather than per bus. We could make it per bus only and let the device call dma_set_mask() explicitly if it wants to restrict it further. While we currently don't have a set of swiotlb DMA ops on ARM32, we do have it on ARM64, and I think we should be using them properly. It should really not be hard to implement a proper dma_set_mask() function for ARM64 that gets is able to set up the swiotlb based on the dma-ranges properties and always returns success but leaves the mask unchanged. The swiotlb bounce buffer needs to be pre-allocated at boot, otherwise we don't have any guarantees. Since we can't honour random masks anyway, we stick to ZONE_DMA which is currently in the 4G limit. But the driver calls dma_set_mask() too late for any further swiotlb setup. With IOMMU we can be more flexible around dma_set_mask(), can be done at run-time. What we can do with swiotlb is to check if the mask is smaller than ZONE_DMA. If it ever is, we have to fail dma_set_mask and hope the driver can fall back to PIO mode or it will have to fail its probe() function. dma_set_(coherent_)mask check swiotlb_dma_supported() which returns false if io_tlb_end goes beyond the device mask. So we just need to ensure that io_tlb is allocated within ZONE_DMA. For dma_set_coherent_mask(), we also have to fail any call that tries to set a mask larger than what the device hardware can do. Unlike that, dma_set_mask() can succeed with any mask, we just have to enable swiotlb if the mask that the driver wants is larger than what the hardware can do. Currently we can't satisfy any arbitrarily small dma mask even with swiotlb since the bounce buffer is just guaranteed to be in ZONE_DMA. Swiotlb allows for smaller masks but we need to reserve the io_tlb buffer early during boot and at smaller addresses. For example
Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64
On Tue, May 20, 2014 at 12:08:58PM +0100, Arnd Bergmann wrote: On Tuesday 20 May 2014 12:02:46 Catalin Marinas wrote: On Mon, May 19, 2014 at 05:55:56PM +0100, Arnd Bergmann wrote: On Monday 19 May 2014 16:56:08 Catalin Marinas wrote: On Mon, May 19, 2014 at 10:44:51AM +0100, Arnd Bergmann wrote: On Monday 19 May 2014 10:03:40 Catalin Marinas wrote: We probably want to default to 32-bit for arm32 in the absence of dma-ranges. For arm64, I'd prefer if we could always mandate dma-ranges to be present for each bus, just like we mandate ranges to be present. I hope it's not too late for that. dma_set_mask should definitely look at the dma-ranges properties, and the helper that Santosh just introduced should give us all the information we need. We just need to decide on the correct behavior. Last time I looked at Santosh's patches I thought the dma-ranges is per device rather than per bus. We could make it per bus only and let the device call dma_set_mask() explicitly if it wants to restrict it further. Can you check again? I've read the code again yesterday to check this, and I concluded it was correctly doing this per bus. You are right, I missed the fact that of_dma_get_range() checks the dma-ranges property in the node's parent. So what we need is setting the default dma mask based on the size information in dma-ranges to something like this: mask = rounddown_pow_of_two(size) - 1; dma_set_mask(dev, mask);/* or dma_set_mask_and_coherent() */ While we currently don't have a set of swiotlb DMA ops on ARM32, we do have it on ARM64, and I think we should be using them properly. It should really not be hard to implement a proper dma_set_mask() function for ARM64 that gets is able to set up the swiotlb based on the dma-ranges properties and always returns success but leaves the mask unchanged. The swiotlb bounce buffer needs to be pre-allocated at boot, otherwise we don't have any guarantees. Since we can't honour random masks anyway, we stick to ZONE_DMA which is currently in the 4G limit. But the driver calls dma_set_mask() too late for any further swiotlb setup. With IOMMU we can be more flexible around dma_set_mask(), can be done at run-time. What we can do with swiotlb is to check if the mask is smaller than ZONE_DMA. If it ever is, we have to fail dma_set_mask and hope the driver can fall back to PIO mode or it will have to fail its probe() function. dma_set_(coherent_)mask check swiotlb_dma_supported() which returns false if io_tlb_end goes beyond the device mask. So we just need to ensure that io_tlb is allocated within ZONE_DMA. Makes sense for dma_set_mask. Why do you do the same thing for coherent_mask? Shouldn't that check against ZONE_DMA instead? It depends on the meaning of coherent_dma_mask. As I understand it, that's the dma mask use for dma_alloc_coherent() to return a memory buffer that the device is able to access. I don't see it much different from the dma_mask used by the streaming API. I guess some hardware could have different masks here if they have cache coherency only for certain memory ranges (and the coherent_dma_mask would be smaller than dma_mask). For dma_set_coherent_mask(), we also have to fail any call that tries to set a mask larger than what the device hardware can do. Unlike that, dma_set_mask() can succeed with any mask, we just have to enable swiotlb if the mask that the driver wants is larger than what the hardware can do. Currently we can't satisfy any arbitrarily small dma mask even with swiotlb since the bounce buffer is just guaranteed to be in ZONE_DMA. Swiotlb allows for smaller masks but we need to reserve the io_tlb buffer early during boot and at smaller addresses. For example, swiotlb_alloc_coherent() first tries __get_free_pages(GFP_DMA) and if the coherent_dma_mask isn't matched, it frees the pages and falls back to the io_tlb buffer. However, I don't think it's worth going for masks smaller than 32-bit on arm64. Is that safe for noncoherent systems? I'd expect the io_tlb buffer to be cached there, which means we can't use it for coherent allocations. For non-coherent systems, the current arm64 approach is to take the address returned by swiotlb_alloc_coherent() and remap it as Normal NonCacheable (see the arm64 __dma_alloc_noncoherent()). The CPU should only access the dma buffer via the non-cacheable mapping. Of course, another approach would be to change the cacheability attributes of the io_tlb buffer (or the CMA one) but current approach has the advantage that the io_tlb buffer can be used for both coherent and non-coherent devices. -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64
On Mon, May 19, 2014 at 09:32:43AM +0100, Arnd Bergmann wrote: On Friday 16 May 2014 13:55:01 Catalin Marinas wrote: On Thu, May 15, 2014 at 05:53:53PM +0100, Liviu Dudau wrote: On Thu, May 15, 2014 at 04:36:25PM +0100, Alan Stern wrote: On Thu, 15 May 2014, Liviu Dudau wrote: On Thu, May 15, 2014 at 03:11:48PM +0100, Alan Stern wrote: On Wed, 14 May 2014, Mark Brown wrote: arm64 architecture handles correctly 64bit DMAs and can enable support for 64bit EHCI host controllers. Did you folks tested this for all sorts of host controllers? I have no way to verify that it works, and last I heard, many (or even most) controllers don't work right with 64-bit DMA. I have tested it with a host controller that is capable of 64-bit DMA and without this change it doesn't work. What do you mean it doesn't work? Can't the host controller use 32-bit DMA? It could if arm64 would restrict the DMA addresses to 32-bit, but it doesn't and I end up on my platform with USB DMA buffers allocated 4GB address. dma_alloc_coherent() on arm64 should return 32-bit addresses if the coherent_dma_mask is set to 32-bit. Which kernel version is this? The more important question is what happens to high buffers allocated elsewhere that get passed into dma_map_sg by a device driver. Depending on the DT properties of the device and its parents, this needs to do one of three things: a) translate the 64-bit virtual address into a 64-bit bus address b) create an IOMMU entry for the 64-bit address and pass the 32-bit IOMMU address to the driver c) use the swiotlb code to create a bounce buffer at a 32-bit DMA address and copy the data around It's definitely wrong to just hardcode a DMA mask in the driver because that code doesn't know which of the three cases is being used. Moreover, you can't do it using an #ifdef CONFIG_ARM64, because it's completely independent of the architecture, and we need to do the exact same logic on ARM32 and any other architecture. I agree. The problem we currently have is system topology description to pass the DMA mask and in a hierarchical way. I can see Santosh's patches introducing dma-ranges but the coherent dma mask still set as 32-bit. We can use the dma-ranges to infer a mask but that's only specific to the device and the driver doesn't know whether it goes through an iommu or not. -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64
On Mon, May 19, 2014 at 10:44:51AM +0100, Arnd Bergmann wrote: On Monday 19 May 2014 10:03:40 Catalin Marinas wrote: On Mon, May 19, 2014 at 09:32:43AM +0100, Arnd Bergmann wrote: The more important question is what happens to high buffers allocated elsewhere that get passed into dma_map_sg by a device driver. Depending on the DT properties of the device and its parents, this needs to do one of three things: a) translate the 64-bit virtual address into a 64-bit bus address b) create an IOMMU entry for the 64-bit address and pass the 32-bit IOMMU address to the driver c) use the swiotlb code to create a bounce buffer at a 32-bit DMA address and copy the data around It's definitely wrong to just hardcode a DMA mask in the driver because that code doesn't know which of the three cases is being used. Moreover, you can't do it using an #ifdef CONFIG_ARM64, because it's completely independent of the architecture, and we need to do the exact same logic on ARM32 and any other architecture. I agree. The problem we currently have is system topology description to pass the DMA mask and in a hierarchical way. I can see Santosh's patches introducing dma-ranges but the coherent dma mask still set as 32-bit. We can use the dma-ranges to infer a mask but that's only specific to the device and the driver doesn't know whether it goes through an iommu or not. We definitely have to fix this very quickly, before people start building real arm64 systems and shipping them. We should not merge any hacks that attempt to work around the problem, but try to come to a conclusion how to handle them properly. My hope was that we could just always set the dma mask to whatever the DT says it should be to keep the burden from device drivers, unless they want to restrict it further (e.g. when the specific peripheral hardware has a bug that prevents us from using high addresses, even though the SoC in theory supports it everywhere). I agree. Rob Herring argued that we should always mimic PCI and call dma_set_mask() in drivers but default to a 32-bit mask otherwise, independent of whether the hardware can do more or less than that, IIRC. Can we not default to something built up from dma-ranges? Or 32-bit if dma-ranges property is missing? While we currently don't have a set of swiotlb DMA ops on ARM32, we do have it on ARM64, and I think we should be using them properly. It should really not be hard to implement a proper dma_set_mask() function for ARM64 that gets is able to set up the swiotlb based on the dma-ranges properties and always returns success but leaves the mask unchanged. The swiotlb bounce buffer needs to be pre-allocated at boot, otherwise we don't have any guarantees. Since we can't honour random masks anyway, we stick to ZONE_DMA which is currently in the 4G limit. But the driver calls dma_set_mask() too late for any further swiotlb setup. With IOMMU we can be more flexible around dma_set_mask(), can be done at run-time. -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64
On Thu, May 15, 2014 at 05:53:53PM +0100, Liviu Dudau wrote: On Thu, May 15, 2014 at 04:36:25PM +0100, Alan Stern wrote: On Thu, 15 May 2014, Liviu Dudau wrote: On Thu, May 15, 2014 at 03:11:48PM +0100, Alan Stern wrote: On Wed, 14 May 2014, Mark Brown wrote: arm64 architecture handles correctly 64bit DMAs and can enable support for 64bit EHCI host controllers. Did you folks tested this for all sorts of host controllers? I have no way to verify that it works, and last I heard, many (or even most) controllers don't work right with 64-bit DMA. I have tested it with a host controller that is capable of 64-bit DMA and without this change it doesn't work. What do you mean it doesn't work? Can't the host controller use 32-bit DMA? It could if arm64 would restrict the DMA addresses to 32-bit, but it doesn't and I end up on my platform with USB DMA buffers allocated 4GB address. dma_alloc_coherent() on arm64 should return 32-bit addresses if the coherent_dma_mask is set to 32-bit. Which kernel version is this? -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64
On Fri, May 16, 2014 at 06:08:45PM +0100, Jon Medhurst (Tixy) wrote: On Fri, 2014-05-16 at 13:55 +0100, Catalin Marinas wrote: [...] It could if arm64 would restrict the DMA addresses to 32-bit, but it doesn't and I end up on my platform with USB DMA buffers allocated 4GB address. dma_alloc_coherent() on arm64 should return 32-bit addresses if the coherent_dma_mask is set to 32-bit. Not if you have CONFIG_DMA_CMA. Unless I have misread the code, enabling CMA means memory comes from a common pool carved out at boot with no way for drivers to specify it's restrictions [1]. It's what I've spent most of the week trying to work around in a clean way, and have finally given up. The easiest hack would be to pass a limit dma_contiguous_reserve() in arm64_memblock_init(), something like this: diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 51d5352e6ad5..558434c69612 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -162,7 +162,7 @@ void __init arm64_memblock_init(void) } early_init_fdt_scan_reserved_mem(); - dma_contiguous_reserve(0); + dma_contiguous_reserve(dma_to_phys(NULL, DMA_BIT_MASK(32)) + 1); memblock_allow_resize(); memblock_dump_all(); probably with a check for IS_ENABLED(CONFIG_ZONE_DMA) (we do this for swiotlb initialisation). At some point, if we have some system topology description we could decide whether we need to limit the above based on the dma coherent masks. -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory synchronization vs. interrupt handlers
On 26 August 2013 16:49, Alan Stern st...@rowland.harvard.edu wrote: Here's a question that doesn't seem to be answered in Documentation/memory-barriers.txt. Are memory accesses within an interrupt handler synchronized with respect to interrupts? In more detail, suppose we have an interrupt handler that uses a memory variable A. The device attached to the IRQ line sends two interrupt requests, and we get: CPU 0 CPU 1 - - Receive IRQ Call the interrupt handler Write A Finish IRQ processing Receive IRQ Call the interrupt handler Read A Finish IRQ processing Is CPU 0's write to A guaranteed to be visible on CPU 1? Given that interrupts on an IRQ line are serialized, and that IRQ processing must involve some amount of memory barriers, I would expect the answer to be Yes. On arm (or arm64), this is not guaranteed. Finishing the IRQ processing usually involves a device write but there is no ordering required with other write accesses. It could easily be fixed in the IRQ controller code (drivers/irqchip/irq-gic.c for newer ARM processors). We have a barrier for SMP cross-calls for the same ordering reason, so I guess we need one for EOI as well. In practice I would think it's nearly impossible to hit this issue, given the time to save the state when taking the interrupt plus some spinlocks, the write from CPU0 would become visible. Also, if the data is accessed by the same driver with the same IRQ, most likely the interrupt goes to the same CPU (unless you had some rebalancing, but this being rarer it makes it less likely). If the data is being accessed between two separate IRQ handlers, a spinlock must be used anyway. -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html