Re: [PATCH] usb: dwc3: host: inherit dma configuration from parent dev

2016-04-27 Thread Catalin Marinas
On Wed, Apr 27, 2016 at 04:11:17PM +0200, Arnd Bergmann wrote:
> On Wednesday 27 April 2016 14:59:00 Catalin Marinas wrote:
> > 
> > I would be in favour of a dma_inherit() function as well. We could hack
> > something up in the arch code (like below) but I would rather prefer an
> > explicit dma_inherit() call by drivers creating such devices.
> > 
> > diff --git a/arch/arm64/include/asm/dma-mapping.h 
> > b/arch/arm64/include/asm/dma-mapping.h
> > index ba437f090a74..ea6fb9b0e8fa 100644
> > --- a/arch/arm64/include/asm/dma-mapping.h
> > +++ b/arch/arm64/include/asm/dma-mapping.h
> > @@ -29,8 +29,11 @@ extern struct dma_map_ops dummy_dma_ops;
> >  
> >  static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
> >  {
> > -   if (dev && dev->archdata.dma_ops)
> > -   return dev->archdata.dma_ops;
> > +   while (dev) {
> > +   if (dev->archdata.dma_ops)
> > +   return dev->archdata.dma_ops;
> > +   dev = dev->parent;
> > +   }
> 
> I think this would be a very bad idea: we don't want to have random
> devices be able to perform DMA just because their parent devices
> have been set up that way.

I agree, it's a big hack. It would be nice to have a simpler way to do
this in driver code rather than explicitly calling
of_dma_configure/arch_setup_dma_ops as per the original patch in this
thread.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: dwc3: host: inherit dma configuration from parent dev

2016-04-27 Thread Catalin Marinas
On Wed, Apr 27, 2016 at 08:41:06AM +0300, Felipe Balbi wrote:
> Grygorii Strashko  writes:
> > On 04/26/2016 09:17 AM, Felipe Balbi wrote:
> >> Grygorii Strashko  writes:
> >>> Now not all DMA paremters configured properly for "xhci-hcd" platform
> >>> device which is created manually. For example: dma_pfn_offset, dam_ops
> >>> and iommu configuration will not corresponds "dwc3" devices
> >>> configuration. As result, this will cause problems like wrong DMA
> >>> addresses translation on platforms with LPAE enabled like Keystone 2.
> >>>
> >>> When platform is using DT boot mode the DMA configuration will be
> >>> parsed and applied from DT, so, to fix this issue, reuse
> >>> of_dma_configure() API and retrieve DMA configuartion for "xhci-hcd"
> >>> from DWC3 device node.
> >> 
> >> patch is incomplete. You left out non-DT users which might suffer from
> >> the same problem.
> >
> > Honestly, I don't know how to fix it gracefully for non-DT case.
> > I can update commit message to mention that this is fix for DT case only.
> 
> no, that won't do :-) There are other users for this driver and they are
> all "out-of-compliance" when it comes to DMA usage. Apparently, the
> desired behavior is to pass correct device to DMA API which the gadget
> side is already doing (see below). For the host side, the fix has to be
> more involved.
> 
> Frankly, I'd prefer that DMA setup could be inherited from parent
> device, then it wouldn't really matter and a bunch of this could be
> simplified. Some sort of dma_inherit(struct device *dev, struct device
> *parent) would go a long way, IMHO.

I would be in favour of a dma_inherit() function as well. We could hack
something up in the arch code (like below) but I would rather prefer an
explicit dma_inherit() call by drivers creating such devices.

diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index ba437f090a74..ea6fb9b0e8fa 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -29,8 +29,11 @@ extern struct dma_map_ops dummy_dma_ops;
 
 static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
 {
-   if (dev && dev->archdata.dma_ops)
-   return dev->archdata.dma_ops;
+   while (dev) {
+   if (dev->archdata.dma_ops)
+   return dev->archdata.dma_ops;
+   dev = dev->parent;
+   }
 
/*
 * We expect no ISA devices, and all other DMA masters are expected to

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dwc3 initiated xhci probe problem in arm64 4.4 kernel due to DMA setup

2016-04-15 Thread Catalin Marinas
On Fri, Apr 15, 2016 at 02:56:17PM +0300, Grygorii Strashko wrote:
> From c68225e97e8c9505aca4ceab19a0d8e4dde31b73 Mon Sep 17 00:00:00 2001
> From: Grygorii Strashko 
> Date: Thu, 31 Mar 2016 19:40:52 +0300
> Subject: [PATCH] usb: dwc3: host: inherit dma configuration from parent dev
> 
> Now not all DMA paremters configured properly for "xhci-hcd" platform
> device which is created manually. For example: dma_pfn_offset, dam_ops
> and iommu configuration will not corresponds "dwc3" devices
> configuration. As result, this will cause problems like wrong DMA
> addresses translation on platforms with LPAE enabled like Keystone 2.
> 
> When platform is using DT boot mode the DMA configuration will be
> parsed and applied from DT, so, to fix this issue, reuse
> of_dma_configure() API and retrieve DMA configuartion for "xhci-hcd"
> from DWC3 device node.
> 
> Signed-off-by: Grygorii Strashko 
> ---
>  drivers/usb/dwc3/host.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
> index c679f63..93c8ef9 100644
> --- a/drivers/usb/dwc3/host.c
> +++ b/drivers/usb/dwc3/host.c
> @@ -17,6 +17,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #include "core.h"
>  
> @@ -32,12 +33,7 @@ int dwc3_host_init(struct dwc3 *dwc)
>   return -ENOMEM;
>   }
>  
> - dma_set_coherent_mask(>dev, dwc->dev->coherent_dma_mask);
> -
>   xhci->dev.parent= dwc->dev;
> - xhci->dev.dma_mask  = dwc->dev->dma_mask;
> - xhci->dev.dma_parms = dwc->dev->dma_parms;
> -
>   dwc->xhci = xhci;
>  
>   ret = platform_device_add_resources(xhci, dwc->xhci_resources,
> @@ -62,6 +58,15 @@ int dwc3_host_init(struct dwc3 *dwc)
>   phy_create_lookup(dwc->usb3_generic_phy, "usb3-phy",
> dev_name(>dev));
>  
> + if (IS_ENABLED(CONFIG_OF) && dwc->dev->of_node) {
> + of_dma_configure(>dev, dwc->dev->of_node);
> + } else {
> + dma_set_coherent_mask(>dev, dwc->dev->coherent_dma_mask);
> +
> + xhci->dev.dma_mask  = dwc->dev->dma_mask;
> + xhci->dev.dma_parms = dwc->dev->dma_parms;
> + }
> +
>   ret = platform_device_add(xhci);

This looks fine to me, though I wonder whether we can make this more
generic so that we don't have to update each driver.

Question for Arnd: would it make sense to add an of_dma_configure(dev,
dev->parent->of_node) call to platform_device_add() _if_ the device
being added does not have an of_node? Alternatively, this could be done
in the arch code via bus notifiers (we used to have one on arm64 for
cache coherency but I removed it in 2189064795dc ("arm64: Implement
set_arch_dma_coherent_ops() to replace bus notifiers")).

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dwc3 initiated xhci probe problem in arm64 4.4 kernel due to DMA setup

2016-04-15 Thread Catalin Marinas
On Fri, Apr 15, 2016 at 01:30:01PM +0300, Felipe Balbi wrote:
> Catalin Marinas <catalin.mari...@arm.com> writes:
> > On Fri, Apr 15, 2016 at 11:01:08AM +0100, Catalin Marinas wrote:
> >> On Fri, Apr 15, 2016 at 12:49:15PM +0300, Felipe Balbi wrote:
> >> > Catalin Marinas <catalin.mari...@arm.com> writes:
> >> > > On Thu, Apr 14, 2016 at 12:46:47PM +, David Fisher wrote:
> >> > >> dwc3 is in dual-role, with "synopsys,dwc3" specified in DT.
> >> > >> 
> >> > >> When xhci is probed, initiated from dwc3/host.c (not DT), we get :
> >> > >> xhci-hcd: probe of xhci-hcd.7.auto failed with error -5
> >> > >> This -EIO error originated from inside dma_set_mask() down in 
> >> > >> include/asm-generic/dma-mapping-common.h
> >> > >> 
> >> > >> If "generic-xhci" is specified in DT instead, it probes fine as a 
> >> > >> host-only dwc3
> >> > >> The difference between DT initiated probe and dwc3 initiated probe is 
> >> > >> that
> >> > >> when DT initiated probe gets to dma_supported, dma_supported is 
> >> > >> implemented by swiotlb_dma_supported (previously set up by a DT call 
> >> > >> to arch_dma_setup_ops).
> >> > >> Whereas when dwc3 initiated xhci probe gets to dma_supported, 
> >> > >> arch_dma_setup_ops has not been called 
> >> > >> and dma_supported is only implemented by __dummy_dma_supported, 
> >> > >> returning 0.
> >> > >> 
> >> > >> Bisecting finds the "bad" commit as 
> >> > >> 1dccb598df549d892b6450c261da54cdd7af44b4  (inbetween 4.4-rc1 and 
> >> > >> 4.4-rc2)
> >> > >> --- a/arch/arm64/include/asm/dma-mapping.h
> >> > >> --- a/arch/arm64/mm/dma-mapping.c
> >> > >> 
> >> > >> Previous to this commit, dma_ops = _dma_ops was done in 
> >> > >> arm64_dma_init
> >> > >> After this commit, the assignment is only done in arch_dma_setup_ops.
> >> > >
> >> > > This restriction was added on purpose and the arm64 __generic_dma_ops()
> >> > > now has a comment:
> >> > >
> >> > >/*
> >> > > * We expect no ISA devices, and all other DMA masters are 
> >> > > expected to
> >> > > * have someone call arch_setup_dma_ops at device creation time.
> >> > > */
> >> > 
> >> > how ?
> >> 
> >> Usually by calling arch_setup_dma_ops().
> >
> > Or of_dma_configure(), I forgot to mention this (see the
> > pci_dma_configure() function as an example).
> 
> the device is manually created, there's not OF/DT for it ;-)

As for PCI, the created device doesn't have a node but the bridge does.
Something like below, completely untested:

diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index c679f63783ae..96d8babd7f23 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -32,7 +32,10 @@ int dwc3_host_init(struct dwc3 *dwc)
return -ENOMEM;
}
 
-   dma_set_coherent_mask(>dev, dwc->dev->coherent_dma_mask);
+   if (IS_ENABLED(CONFIG_OF) && dwc->dev->of_node)
+   of_dma_configure(>dev, dwc->dev->of_node);
+   else
+   dma_set_coherent_mask(>dev, dwc->dev->coherent_dma_mask);
 
xhci->dev.parent= dwc->dev;
xhci->dev.dma_mask  = dwc->dev->dma_mask;

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dwc3 initiated xhci probe problem in arm64 4.4 kernel due to DMA setup

2016-04-15 Thread Catalin Marinas
On Thu, Apr 14, 2016 at 12:46:47PM +, David Fisher wrote:
> dwc3 is in dual-role, with "synopsys,dwc3" specified in DT.
> 
> When xhci is probed, initiated from dwc3/host.c (not DT), we get :
> xhci-hcd: probe of xhci-hcd.7.auto failed with error -5
> This -EIO error originated from inside dma_set_mask() down in 
> include/asm-generic/dma-mapping-common.h
> 
> If "generic-xhci" is specified in DT instead, it probes fine as a host-only 
> dwc3
> The difference between DT initiated probe and dwc3 initiated probe is that
> when DT initiated probe gets to dma_supported, dma_supported is 
> implemented by swiotlb_dma_supported (previously set up by a DT call to 
> arch_dma_setup_ops).
> Whereas when dwc3 initiated xhci probe gets to dma_supported, 
> arch_dma_setup_ops has not been called 
> and dma_supported is only implemented by __dummy_dma_supported, returning 0.
> 
> Bisecting finds the "bad" commit as 
> 1dccb598df549d892b6450c261da54cdd7af44b4  (inbetween 4.4-rc1 and 4.4-rc2)
> --- a/arch/arm64/include/asm/dma-mapping.h
> --- a/arch/arm64/mm/dma-mapping.c
> 
> Previous to this commit, dma_ops = _dma_ops was done in arm64_dma_init
> After this commit, the assignment is only done in arch_dma_setup_ops.

This restriction was added on purpose and the arm64 __generic_dma_ops()
now has a comment:

/*
 * We expect no ISA devices, and all other DMA masters are expected to
 * have someone call arch_setup_dma_ops at device creation time.
 */

The commit above also describes why it is dangerous to assume a fallback
to swiotlb ops.

> We're not using any dwc3-.c wrapper here, but we've not needed it 
> before this commit. Relevant ?
> It might also be possible that we've messed up some KConfig that changed 
> between versions ?
> Or is this a bug that needs patching - in arm(64) dma or something different 
> for dma setup in dwc3/host.c ?

What I don't understand is why of_dma_configure() is not called for a
device with compatible="synopsys,dwc3" in DT (IIUC, you said that you
specify this in DT and of_platform_populate should do the right
configuration).

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Honor ACPI _CCA attribute setting

2015-08-14 Thread Catalin Marinas
On Fri, Aug 14, 2015 at 08:45:22AM +0700, Suravee Suthikulpanit wrote:
 On 8/13/15 04:51, Jeremy Linton wrote:
 ACPI configurations can now mark devices as noncoherent,
 support that choice.
 
 Signed-off-by: Jeremy Linton jeremy.lin...@arm.com
 ---
   include/acpi/acpi_bus.h | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
 index 83061ca..7ecb8e4 100644
 --- a/include/acpi/acpi_bus.h
 +++ b/include/acpi/acpi_bus.h
 @@ -399,7 +399,7 @@ static inline bool acpi_check_dma(struct acpi_device 
 *adev, bool *coherent)
   * case 1. Do not support and disable DMA.
   * case 2. Support but rely on arch-specific cache maintenance for
   * non-coherence DMA operations.
 - * Currently, we implement case 1 above.
 + * Currently, we implement case 2 above.
   *
   * For the case when _CCA is missing (i.e. cca_seen=0) and
   * platform specifies ACPI_CCA_REQUIRED, we do not support DMA,
 @@ -407,7 +407,8 @@ static inline bool acpi_check_dma(struct acpi_device 
 *adev, bool *coherent)
   *
   * See acpi_init_coherency() for more info.
   */
 -if (adev-flags.coherent_dma) {
 +if (adev-flags.coherent_dma ||
 +(adev-flags.cca_seen  IS_ENABLED(CONFIG_ARM64))) {
  ret = true;
  if (coherent)
  *coherent = adev-flags.coherent_dma;
 
 
 This change was in my earlier revisions for the original patch series to add
 ACPI CCA support. At the time, this was pushed back since we were not sure
 whether this would be a useful case, and whether such hardware exists.
 
 Would it be useful to document somewhere (may be in the GIT commit message)
 about which hardware might need this?

So far, it's the ARM Juno development board (the emphasis here is on
being able to use it for development, not a production system).

I think the commit log should also give you credit for the original
implementation.

 Arnd/Catalin, any feedback on this?

That's where it was left in the previous thread:

https://lkml.org/lkml/2015/5/21/376

(and I'll refrain from further comments ;))

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

2014-05-22 Thread Catalin Marinas
On Mon, May 19, 2014 at 09:21:17AM +0100, Jon Medhurst (Tixy) wrote:
 On Fri, 2014-05-16 at 18:40 +0100, Catalin Marinas wrote:
  On Fri, May 16, 2014 at 06:08:45PM +0100, Jon Medhurst (Tixy) wrote:
   On Fri, 2014-05-16 at 13:55 +0100, Catalin Marinas wrote:
   [...]
 It could if arm64 would restrict the DMA addresses to 32-bit, but it 
 doesn't
 and I end up on my platform with USB DMA buffers allocated 4GB 
 address.

dma_alloc_coherent() on arm64 should return 32-bit addresses if the
coherent_dma_mask is set to 32-bit.
   
   Not if you have CONFIG_DMA_CMA. Unless I have misread the code, enabling
   CMA means memory comes from a common pool carved out at boot with no way
   for drivers to specify it's restrictions [1]. It's what I've spent most
   of the week trying to work around in a clean way, and have finally given
   up.
  
  The easiest hack would be to pass a limit dma_contiguous_reserve()
  in arm64_memblock_init(), something like this:
 
 That is by far the easiest but I dismissed it because it affects all
 platforms built from that source tree; and if the hack were extended to
 include a kernel config option, that still may not work on a single
 kernel binary expected to work on multiple platforms. Basically, I've
 tried various was to do it 'properly' and after failing am resorting to
 truly hideous hacks to the (out-of-tree driver) code - so at least other
 platforms won't be impacted.

Can you set a specific reserved memory region in the DT to be used by
CMA (via linux,cma-default), or it's just for the default size?

On arm64 we enabled CONFIG_DMA_CMA by default but compared to swiotlb it
doesn't honour GFP_DMA. The arm32 port sets the CMA limit to
arm_dma_limit which is 32-bit or a SoC-define one. So I'm tempted to
default to 32-bit as well if it can be overridden via DT.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

2014-05-21 Thread Catalin Marinas
On Tue, May 20, 2014 at 08:37:10PM +0100, Arnd Bergmann wrote:
 On Tuesday 20 May 2014 15:12:49 Catalin Marinas wrote:
  On Tue, May 20, 2014 at 12:08:58PM +0100, Arnd Bergmann wrote:
   On Tuesday 20 May 2014 12:02:46 Catalin Marinas wrote:
On Mon, May 19, 2014 at 05:55:56PM +0100, Arnd Bergmann wrote:
 On Monday 19 May 2014 16:56:08 Catalin Marinas wrote:
  On Mon, May 19, 2014 at 10:44:51AM +0100, Arnd Bergmann wrote:
   On Monday 19 May 2014 10:03:40 Catalin Marinas wrote:
 We probably want to default to 32-bit for arm32 in the absence of 
 dma-ranges.
 For arm64, I'd prefer if we could always mandate dma-ranges to be 
 present
 for each bus, just like we mandate ranges to be present.
 I hope it's not too late for that.
 
 dma_set_mask should definitely look at the dma-ranges properties, and 
 the
 helper that Santosh just introduced should give us all the information
 we need. We just need to decide on the correct behavior.

Last time I looked at Santosh's patches I thought the dma-ranges is per
device rather than per bus. We could make it per bus only and let the
device call dma_set_mask() explicitly if it wants to restrict it
further.
   
   Can you check again? I've read the code again yesterday to check this,
   and I concluded it was correctly doing this per bus.
  
  You are right, I missed the fact that of_dma_get_range() checks the
  dma-ranges property in the node's parent.
  
  So what we need is setting the default dma mask based on the size
  information in dma-ranges to something like this:
  
  mask = rounddown_pow_of_two(size) - 1;
  dma_set_mask(dev, mask);/* or dma_set_mask_and_coherent() */
 
 I don't think we should be calling dma_set_mask here, since that would
 just go and parse the same property again once we fix the implementation.
 
 Since this is a low-level helper, we can probably just assign the dma mask
 directly.

I was thinking of calling it in of_dma_configure() as that's where we
already set the default dma_mask and coherent_dma_mask. Default to
32-bit if no dma-ranges are present.

   While we currently don't have a set of swiotlb DMA ops on ARM32, 
   we do
   have it on ARM64, and I think we should be using them properly. 
   It should
   really not be hard to implement a proper dma_set_mask() function 
   for
   ARM64 that gets is able to set up the swiotlb based on the 
   dma-ranges
   properties and always returns success but leaves the mask 
   unchanged.
  
  The swiotlb bounce buffer needs to be pre-allocated at boot, 
  otherwise
  we don't have any guarantees. Since we can't honour random masks 
  anyway,
  we stick to ZONE_DMA which is currently in the 4G limit. But the 
  driver
  calls dma_set_mask() too late for any further swiotlb setup.
  
  With IOMMU we can be more flexible around dma_set_mask(), can be 
  done at
  run-time.
 
 What we can do with swiotlb is to check if the mask is smaller than 
 ZONE_DMA.
 If it ever is, we have to fail dma_set_mask and hope the driver can 
 fall
 back to PIO mode or it will have to fail its probe() function.

dma_set_(coherent_)mask check swiotlb_dma_supported() which returns
false if io_tlb_end goes beyond the device mask. So we just need to
ensure that io_tlb is allocated within ZONE_DMA.
   
   Makes sense for dma_set_mask. Why do you do the same thing for
   coherent_mask? Shouldn't that check against ZONE_DMA instead?
  
  It depends on the meaning of coherent_dma_mask. As I understand it,
  that's the dma mask use for dma_alloc_coherent() to return a memory
  buffer that the device is able to access. I don't see it much different
  from the dma_mask used by the streaming API. I guess some hardware could
  have different masks here if they have cache coherency only for certain
  memory ranges (and the coherent_dma_mask would be smaller than dma_mask).
 
 The point I was trying to make is a different one: For the streaming mapping,
 you can allow any mask as long as the swiotlb is able to create a bounce
 buffer.

dma_mask is a hardware parameter and is used by the streaming DMA API
implementation (which could be swiotlb) to decide whether to bounce or
not. The streaming DMA API can take any CPU address as input,
independent of the dma_mask, and bounce accordingly. If it doesn't have
a bounce buffer matching the dma_mask, dma_supported() would return
false.

 This does not work for the coherent mapping, because that by definition
 cannot use bounce buffers.

Yes but most of the time these masks have the same value.

 For dma_set_coherent_mask(), we also have to fail any call that tries 
 to
 set a mask larger than what the device hardware can do. Unlike that,
 dma_set_mask() can succeed with any mask, we just have to enable 
 swiotlb
 if the mask that the driver

Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

2014-05-21 Thread Catalin Marinas
On Wed, May 21, 2014 at 03:43:39PM +0100, Arnd Bergmann wrote:
 On Wednesday 21 May 2014 14:56:36 Catalin Marinas wrote:
  On Tue, May 20, 2014 at 08:37:10PM +0100, Arnd Bergmann wrote:
   On Tuesday 20 May 2014 15:12:49 Catalin Marinas wrote:
So what we need is setting the default dma mask based on the size
information in dma-ranges to something like this:

mask = rounddown_pow_of_two(size) - 1;
dma_set_mask(dev, mask);/* or 
dma_set_mask_and_coherent() */
   
   I don't think we should be calling dma_set_mask here, since that would
   just go and parse the same property again once we fix the implementation.
   
   Since this is a low-level helper, we can probably just assign the dma mask
   directly.
  
  I was thinking of calling it in of_dma_configure() as that's where we
  already set the default dma_mask and coherent_dma_mask. Default to
  32-bit if no dma-ranges are present.
 
 Right. Actually it should also be capped to 32-bit, to allow compatibility
 with drivers that don't call dma_set_mask and can't do 64-bit DMA. This
 is the normal behavior for PCI drivers. They need to set a 64-bit mask
 and check the result.

OK.

   The point I was trying to make is a different one: For the streaming 
   mapping,
   you can allow any mask as long as the swiotlb is able to create a bounce
   buffer.
  
  dma_mask is a hardware parameter and is used by the streaming DMA API
  implementation (which could be swiotlb) to decide whether to bounce or
  not. The streaming DMA API can take any CPU address as input,
  independent of the dma_mask, and bounce accordingly. If it doesn't have
  a bounce buffer matching the dma_mask, dma_supported() would return
  false.
 
   This does not work for the coherent mapping, because that by definition
   cannot use bounce buffers.
  
  Yes but most of the time these masks have the same value.
 
 Let me start again with an example:
 
 io_tlb_end=0x100; // 16 MB
 dma_mask  =   0x1000; // 256 MB
 ZONE_DMA  =  0x1; // 4 GB
 max_pfn   = 0x10; // 64 GB
 
 The device driver has set dma_mask and dma_coherent_mask to 256 because
 of a limitation in the hardware. This succeeded because io_tlb_end is
 well within the dma mask.
 
 Since the coherent_dma_mask is smaller than max_pfn, the swiotlb version
 of dma_alloc_coherent then allocates the buffer using GFP_DMA. However,
 that likely returns an address above dma_mask/coherent_dma_mask.

swiotlb_alloc_coherent() first tries __get_free_pages(GFP_DMA). If the
returned address is not within coherent_dma_mask, it frees the allocated
pages and tries to allocate from its bounce buffer (io_tlb).

 My point was that to fix this case, you must not compare the
 coherent_dma_mask requested by the device against io_tlb_end but against
 ZONE_DMA.

See above, I think swiotlb does the right thing.

We have a problem, however, with CMA, as we assume that the returned
address is within coherent_dma_mask (we need a contiguous_dma_supported
function, similar to swiotlb_dma_supported). We also don't limit the CMA
buffer to ZONE_DMA on arm64.

   As far as I understand it, you are not allowed
   to create a noncacheable mapping for a page that also has a cachable
   mapping, or did that change between armv7 and armv8?
  
  No change between ARMv7 and ARMv8. In fact, ARMv7 has clarified what
  unpredictable means when you mix different attributes. Basically, we
  can mix the same memory type (Normal Cacheable vs Normal NonCacheable)
  but if the cacheability is different, we need to do some cache
  maintenance before accessing the non-cacheable mapping the first time
  (clear potentially dirty lines that could be evicted randomly). Any
  speculatively loaded cache lines via the cacheable mapping would not be
  hit when accessed via the non-cacheable one.
 
 Ah, only for the first access? I had remembered this differently, thanks
 for the clarification.

The first time, but assuming that you no longer write to the cacheable
alias to dirty it again (basically it's like the streaming DMA, if you
want to access the cacheable alias you need cache maintenance).

   For CMA, we use the trick to change the mapping in place, but that
   doesn't work for normal pages, which are mapped using huge tlbs.
  
  That's true for arm32 and we could do the same for arm64, though I don't
  think its worth because we could break very huge mappings (e.g. 1GB). We
  have plenty of VA space and we just create a new non-coherent mapping
  (this latter part could be optimised, even generically, to use huge
  pages where possible).
 
 I thought we specifically did this on arm32 to avoid the duplicate
 mapping because we had concluded that it would be broken even with the
 updated ARMv7 definition.

We did it because it was vaguely defined as unpredictable in the ARM
ARM for a long time until the hw guys realised it's not easy to change
Linux and decided to clarify what could actually

Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

2014-05-21 Thread Catalin Marinas
On Wed, May 21, 2014 at 05:15:06PM +0100, Rob Herring wrote:
 On Wed, May 21, 2014 at 10:48 AM, Arnd Bergmann a...@arndb.de wrote:
  On Wednesday 21 May 2014 10:26:01 Rob Herring wrote:
  What are you checking against to cause a failure and what do you do on
  failure? I'm guessing that PCI masks are compared to the mask of
  parent bridges and PCI devices just set the mask to 32-bit if 64-bit
  fails. That doesn't work if your mask needs to be somewhere between 32
  and 64-bit due to some bus constraints. Perhaps that's not something
  we need to worry about until we see hardware with that condition.
 
  We should compare against the size returned by of_dma_get_range(). If
  the mask requested by the driver is larger than the mask of the bus
  it's attached on, dma_set_mask should fail.
 
  We can always allow 64-bit masks if the actual bus capability is enough
  to cover all the installed RAM. That is a relatively common case.
 
 Agreed. However, if we check dma-ranges, it may be large enough for
 all of RAM, but less than a full 64-bit. There is also the edge case
 that you cannot set the size to 2^64, but only 2^64 - 1. That means
 dma_set_mask(2^64 - 1) will always fail.

Size of 0 meaning all range?

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

2014-05-20 Thread Catalin Marinas
On Mon, May 19, 2014 at 05:55:56PM +0100, Arnd Bergmann wrote:
 On Monday 19 May 2014 16:56:08 Catalin Marinas wrote:
  On Mon, May 19, 2014 at 10:44:51AM +0100, Arnd Bergmann wrote:
   On Monday 19 May 2014 10:03:40 Catalin Marinas wrote:
On Mon, May 19, 2014 at 09:32:43AM +0100, Arnd Bergmann wrote:
 The more important question is what happens to high buffers allocated 
 elsewhere
 that get passed into dma_map_sg by a device driver. Depending on the 
 DT properties
 of the device and its parents, this needs to do one of three things:
 
 a) translate the 64-bit virtual address into a 64-bit bus address
 b) create an IOMMU entry for the 64-bit address and pass the 32-bit 
 IOMMU
address to the driver
 c) use the swiotlb code to create a bounce buffer at a 32-bit DMA 
 address
and copy the data around
 
 It's definitely wrong to just hardcode a DMA mask in the driver 
 because that
 code doesn't know which of the three cases is being used. Moreover, 
 you can't
 do it using an #ifdef CONFIG_ARM64, because it's completely 
 independent of
 the architecture, and we need to do the exact same logic on ARM32 and 
 any
 other architecture.

I agree.

The problem we currently have is system topology description to pass the
DMA mask and in a hierarchical way. I can see Santosh's patches
introducing dma-ranges but the coherent dma mask still set as 32-bit. We
can use the dma-ranges to infer a mask but that's only specific to the
device and the driver doesn't know whether it goes through an iommu or
not.
   
   We definitely have to fix this very quickly, before people start building
   real arm64 systems and shipping them.
   
   We should not merge any hacks that attempt to work around the problem,
   but try to come to a conclusion how to handle them properly.
   My hope was that we could just always set the dma mask to whatever
   the DT says it should be to keep the burden from device drivers,
   unless they want to restrict it further (e.g. when the specific
   peripheral hardware has a bug that prevents us from using high addresses,
   even though the SoC in theory supports it everywhere).
  
  I agree.
  
   Rob Herring argued that we should always mimic PCI and call dma_set_mask()
   in drivers but default to a 32-bit mask otherwise, independent of whether
   the hardware can do more or less than that, IIRC.
  
  Can we not default to something built up from dma-ranges? Or 32-bit if
  dma-ranges property is missing?
 
 We probably want to default to 32-bit for arm32 in the absence of dma-ranges.
 For arm64, I'd prefer if we could always mandate dma-ranges to be present
 for each bus, just like we mandate ranges to be present.
 I hope it's not too late for that.
 
 dma_set_mask should definitely look at the dma-ranges properties, and the
 helper that Santosh just introduced should give us all the information
 we need. We just need to decide on the correct behavior.

Last time I looked at Santosh's patches I thought the dma-ranges is per
device rather than per bus. We could make it per bus only and let the
device call dma_set_mask() explicitly if it wants to restrict it
further.

   While we currently don't have a set of swiotlb DMA ops on ARM32, we do
   have it on ARM64, and I think we should be using them properly. It should
   really not be hard to implement a proper dma_set_mask() function for
   ARM64 that gets is able to set up the swiotlb based on the dma-ranges
   properties and always returns success but leaves the mask unchanged.
  
  The swiotlb bounce buffer needs to be pre-allocated at boot, otherwise
  we don't have any guarantees. Since we can't honour random masks anyway,
  we stick to ZONE_DMA which is currently in the 4G limit. But the driver
  calls dma_set_mask() too late for any further swiotlb setup.
  
  With IOMMU we can be more flexible around dma_set_mask(), can be done at
  run-time.
 
 What we can do with swiotlb is to check if the mask is smaller than ZONE_DMA.
 If it ever is, we have to fail dma_set_mask and hope the driver can fall
 back to PIO mode or it will have to fail its probe() function.

dma_set_(coherent_)mask check swiotlb_dma_supported() which returns
false if io_tlb_end goes beyond the device mask. So we just need to
ensure that io_tlb is allocated within ZONE_DMA.

 For dma_set_coherent_mask(), we also have to fail any call that tries to
 set a mask larger than what the device hardware can do. Unlike that,
 dma_set_mask() can succeed with any mask, we just have to enable swiotlb
 if the mask that the driver wants is larger than what the hardware can
 do.

Currently we can't satisfy any arbitrarily small dma mask even with
swiotlb since the bounce buffer is just guaranteed to be in ZONE_DMA.
Swiotlb allows for smaller masks but we need to reserve the io_tlb
buffer early during boot and at smaller addresses. For example

Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

2014-05-20 Thread Catalin Marinas
On Tue, May 20, 2014 at 12:08:58PM +0100, Arnd Bergmann wrote:
 On Tuesday 20 May 2014 12:02:46 Catalin Marinas wrote:
  On Mon, May 19, 2014 at 05:55:56PM +0100, Arnd Bergmann wrote:
   On Monday 19 May 2014 16:56:08 Catalin Marinas wrote:
On Mon, May 19, 2014 at 10:44:51AM +0100, Arnd Bergmann wrote:
 On Monday 19 May 2014 10:03:40 Catalin Marinas wrote:
   We probably want to default to 32-bit for arm32 in the absence of 
   dma-ranges.
   For arm64, I'd prefer if we could always mandate dma-ranges to be present
   for each bus, just like we mandate ranges to be present.
   I hope it's not too late for that.
   
   dma_set_mask should definitely look at the dma-ranges properties, and the
   helper that Santosh just introduced should give us all the information
   we need. We just need to decide on the correct behavior.
  
  Last time I looked at Santosh's patches I thought the dma-ranges is per
  device rather than per bus. We could make it per bus only and let the
  device call dma_set_mask() explicitly if it wants to restrict it
  further.
 
 Can you check again? I've read the code again yesterday to check this,
 and I concluded it was correctly doing this per bus.

You are right, I missed the fact that of_dma_get_range() checks the
dma-ranges property in the node's parent.

So what we need is setting the default dma mask based on the size
information in dma-ranges to something like this:

mask = rounddown_pow_of_two(size) - 1;
dma_set_mask(dev, mask);/* or dma_set_mask_and_coherent() */

 While we currently don't have a set of swiotlb DMA ops on ARM32, we do
 have it on ARM64, and I think we should be using them properly. It 
 should
 really not be hard to implement a proper dma_set_mask() function for
 ARM64 that gets is able to set up the swiotlb based on the dma-ranges
 properties and always returns success but leaves the mask unchanged.

The swiotlb bounce buffer needs to be pre-allocated at boot, otherwise
we don't have any guarantees. Since we can't honour random masks anyway,
we stick to ZONE_DMA which is currently in the 4G limit. But the driver
calls dma_set_mask() too late for any further swiotlb setup.

With IOMMU we can be more flexible around dma_set_mask(), can be done at
run-time.
   
   What we can do with swiotlb is to check if the mask is smaller than 
   ZONE_DMA.
   If it ever is, we have to fail dma_set_mask and hope the driver can fall
   back to PIO mode or it will have to fail its probe() function.
  
  dma_set_(coherent_)mask check swiotlb_dma_supported() which returns
  false if io_tlb_end goes beyond the device mask. So we just need to
  ensure that io_tlb is allocated within ZONE_DMA.
 
 Makes sense for dma_set_mask. Why do you do the same thing for
 coherent_mask? Shouldn't that check against ZONE_DMA instead?

It depends on the meaning of coherent_dma_mask. As I understand it,
that's the dma mask use for dma_alloc_coherent() to return a memory
buffer that the device is able to access. I don't see it much different
from the dma_mask used by the streaming API. I guess some hardware could
have different masks here if they have cache coherency only for certain
memory ranges (and the coherent_dma_mask would be smaller than dma_mask).

   For dma_set_coherent_mask(), we also have to fail any call that tries to
   set a mask larger than what the device hardware can do. Unlike that,
   dma_set_mask() can succeed with any mask, we just have to enable swiotlb
   if the mask that the driver wants is larger than what the hardware can
   do.
  
  Currently we can't satisfy any arbitrarily small dma mask even with
  swiotlb since the bounce buffer is just guaranteed to be in ZONE_DMA.
  Swiotlb allows for smaller masks but we need to reserve the io_tlb
  buffer early during boot and at smaller addresses. For example,
  swiotlb_alloc_coherent() first tries __get_free_pages(GFP_DMA) and if
  the coherent_dma_mask isn't matched, it frees the pages and falls back
  to the io_tlb buffer. However, I don't think it's worth going for masks
  smaller than 32-bit on arm64.
 
 Is that safe for noncoherent systems? I'd expect the io_tlb buffer
 to be cached there, which means we can't use it for coherent allocations.

For non-coherent systems, the current arm64 approach is to take the
address returned by swiotlb_alloc_coherent() and remap it as Normal
NonCacheable (see the arm64 __dma_alloc_noncoherent()). The CPU should
only access the dma buffer via the non-cacheable mapping.

Of course, another approach would be to change the cacheability
attributes of the io_tlb buffer (or the CMA one) but current approach
has the advantage that the io_tlb buffer can be used for both coherent
and non-coherent devices.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

2014-05-19 Thread Catalin Marinas
On Mon, May 19, 2014 at 09:32:43AM +0100, Arnd Bergmann wrote:
 On Friday 16 May 2014 13:55:01 Catalin Marinas wrote:
  On Thu, May 15, 2014 at 05:53:53PM +0100, Liviu Dudau wrote:
   On Thu, May 15, 2014 at 04:36:25PM +0100, Alan Stern wrote:
On Thu, 15 May 2014, Liviu Dudau wrote:
 On Thu, May 15, 2014 at 03:11:48PM +0100, Alan Stern wrote:
  On Wed, 14 May 2014, Mark Brown wrote:
   arm64 architecture handles correctly 64bit DMAs and can enable 
   support
   for 64bit EHCI host controllers.
  
  Did you folks tested this for all sorts of host controllers?  I 
  have no
  way to verify that it works, and last I heard, many (or even most)  
  controllers don't work right with 64-bit DMA.
 
 I have tested it with a host controller that is capable of 64-bit DMA 
 and
 without this change it doesn't work.

What do you mean it doesn't work?  Can't the host controller use 32-bit
DMA?
   
   It could if arm64 would restrict the DMA addresses to 32-bit, but it 
   doesn't
   and I end up on my platform with USB DMA buffers allocated 4GB address.
  
  dma_alloc_coherent() on arm64 should return 32-bit addresses if the
  coherent_dma_mask is set to 32-bit. Which kernel version is this?
 
 The more important question is what happens to high buffers allocated 
 elsewhere
 that get passed into dma_map_sg by a device driver. Depending on the DT 
 properties
 of the device and its parents, this needs to do one of three things:
 
 a) translate the 64-bit virtual address into a 64-bit bus address
 b) create an IOMMU entry for the 64-bit address and pass the 32-bit IOMMU
address to the driver
 c) use the swiotlb code to create a bounce buffer at a 32-bit DMA address
and copy the data around
 
 It's definitely wrong to just hardcode a DMA mask in the driver because that
 code doesn't know which of the three cases is being used. Moreover, you can't
 do it using an #ifdef CONFIG_ARM64, because it's completely independent of
 the architecture, and we need to do the exact same logic on ARM32 and any
 other architecture.

I agree.

The problem we currently have is system topology description to pass the
DMA mask and in a hierarchical way. I can see Santosh's patches
introducing dma-ranges but the coherent dma mask still set as 32-bit. We
can use the dma-ranges to infer a mask but that's only specific to the
device and the driver doesn't know whether it goes through an iommu or
not.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

2014-05-19 Thread Catalin Marinas
On Mon, May 19, 2014 at 10:44:51AM +0100, Arnd Bergmann wrote:
 On Monday 19 May 2014 10:03:40 Catalin Marinas wrote:
  On Mon, May 19, 2014 at 09:32:43AM +0100, Arnd Bergmann wrote:
   The more important question is what happens to high buffers allocated 
   elsewhere
   that get passed into dma_map_sg by a device driver. Depending on the DT 
   properties
   of the device and its parents, this needs to do one of three things:
   
   a) translate the 64-bit virtual address into a 64-bit bus address
   b) create an IOMMU entry for the 64-bit address and pass the 32-bit IOMMU
  address to the driver
   c) use the swiotlb code to create a bounce buffer at a 32-bit DMA address
  and copy the data around
   
   It's definitely wrong to just hardcode a DMA mask in the driver because 
   that
   code doesn't know which of the three cases is being used. Moreover, you 
   can't
   do it using an #ifdef CONFIG_ARM64, because it's completely independent of
   the architecture, and we need to do the exact same logic on ARM32 and any
   other architecture.
  
  I agree.
  
  The problem we currently have is system topology description to pass the
  DMA mask and in a hierarchical way. I can see Santosh's patches
  introducing dma-ranges but the coherent dma mask still set as 32-bit. We
  can use the dma-ranges to infer a mask but that's only specific to the
  device and the driver doesn't know whether it goes through an iommu or
  not.
 
 We definitely have to fix this very quickly, before people start building
 real arm64 systems and shipping them.
 
 We should not merge any hacks that attempt to work around the problem,
 but try to come to a conclusion how to handle them properly.
 My hope was that we could just always set the dma mask to whatever
 the DT says it should be to keep the burden from device drivers,
 unless they want to restrict it further (e.g. when the specific
 peripheral hardware has a bug that prevents us from using high addresses,
 even though the SoC in theory supports it everywhere).

I agree.

 Rob Herring argued that we should always mimic PCI and call dma_set_mask()
 in drivers but default to a 32-bit mask otherwise, independent of whether
 the hardware can do more or less than that, IIRC.

Can we not default to something built up from dma-ranges? Or 32-bit if
dma-ranges property is missing?

 While we currently don't have a set of swiotlb DMA ops on ARM32, we do
 have it on ARM64, and I think we should be using them properly. It should
 really not be hard to implement a proper dma_set_mask() function for
 ARM64 that gets is able to set up the swiotlb based on the dma-ranges
 properties and always returns success but leaves the mask unchanged.

The swiotlb bounce buffer needs to be pre-allocated at boot, otherwise
we don't have any guarantees. Since we can't honour random masks anyway,
we stick to ZONE_DMA which is currently in the 4G limit. But the driver
calls dma_set_mask() too late for any further swiotlb setup.

With IOMMU we can be more flexible around dma_set_mask(), can be done at
run-time.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

2014-05-16 Thread Catalin Marinas
On Thu, May 15, 2014 at 05:53:53PM +0100, Liviu Dudau wrote:
 On Thu, May 15, 2014 at 04:36:25PM +0100, Alan Stern wrote:
  On Thu, 15 May 2014, Liviu Dudau wrote:
   On Thu, May 15, 2014 at 03:11:48PM +0100, Alan Stern wrote:
On Wed, 14 May 2014, Mark Brown wrote:
 arm64 architecture handles correctly 64bit DMAs and can enable support
 for 64bit EHCI host controllers.

Did you folks tested this for all sorts of host controllers?  I have no
way to verify that it works, and last I heard, many (or even most)  
controllers don't work right with 64-bit DMA.
   
   I have tested it with a host controller that is capable of 64-bit DMA and
   without this change it doesn't work.
  
  What do you mean it doesn't work?  Can't the host controller use 32-bit
  DMA?
 
 It could if arm64 would restrict the DMA addresses to 32-bit, but it doesn't
 and I end up on my platform with USB DMA buffers allocated 4GB address.

dma_alloc_coherent() on arm64 should return 32-bit addresses if the
coherent_dma_mask is set to 32-bit. Which kernel version is this?

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

2014-05-16 Thread Catalin Marinas
On Fri, May 16, 2014 at 06:08:45PM +0100, Jon Medhurst (Tixy) wrote:
 On Fri, 2014-05-16 at 13:55 +0100, Catalin Marinas wrote:
 [...]
   It could if arm64 would restrict the DMA addresses to 32-bit, but it 
   doesn't
   and I end up on my platform with USB DMA buffers allocated 4GB address.
  
  dma_alloc_coherent() on arm64 should return 32-bit addresses if the
  coherent_dma_mask is set to 32-bit.
 
 Not if you have CONFIG_DMA_CMA. Unless I have misread the code, enabling
 CMA means memory comes from a common pool carved out at boot with no way
 for drivers to specify it's restrictions [1]. It's what I've spent most
 of the week trying to work around in a clean way, and have finally given
 up.

The easiest hack would be to pass a limit dma_contiguous_reserve()
in arm64_memblock_init(), something like this:

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 51d5352e6ad5..558434c69612 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -162,7 +162,7 @@ void __init arm64_memblock_init(void)
}
 
early_init_fdt_scan_reserved_mem();
-   dma_contiguous_reserve(0);
+   dma_contiguous_reserve(dma_to_phys(NULL, DMA_BIT_MASK(32)) + 1);
 
memblock_allow_resize();
memblock_dump_all();

probably with a check for IS_ENABLED(CONFIG_ZONE_DMA) (we do this for
swiotlb initialisation).

At some point, if we have some system topology description we could
decide whether we need to limit the above based on the dma coherent
masks.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory synchronization vs. interrupt handlers

2013-09-02 Thread Catalin Marinas
On 26 August 2013 16:49, Alan Stern st...@rowland.harvard.edu wrote:
 Here's a question that doesn't seem to be answered in
 Documentation/memory-barriers.txt.  Are memory accesses within an
 interrupt handler synchronized with respect to interrupts?

 In more detail, suppose we have an interrupt handler that uses a memory
 variable A.  The device attached to the IRQ line sends two interrupt
 requests, and we get:

 CPU 0   CPU 1
 -   -
 Receive IRQ
 Call the interrupt handler
 Write A
 Finish IRQ processing

 Receive IRQ
 Call the interrupt handler
 Read A
 Finish IRQ processing

 Is CPU 0's write to A guaranteed to be visible on CPU 1?  Given that
 interrupts on an IRQ line are serialized, and that IRQ processing must
 involve some amount of memory barriers, I would expect the answer to be
 Yes.

On arm (or arm64), this is not guaranteed. Finishing the IRQ
processing usually involves a device write but there is no ordering
required with other write accesses. It could easily be fixed in the
IRQ controller code (drivers/irqchip/irq-gic.c for newer ARM
processors). We have a barrier for SMP cross-calls for the same
ordering reason, so I guess we need one for EOI as well.

In practice I would think it's nearly impossible to hit this issue,
given the time to save the state when taking the interrupt plus some
spinlocks, the write from CPU0 would become visible.

Also, if the data is accessed by the same driver with the same IRQ,
most likely the interrupt goes to the same CPU (unless you had some
rebalancing, but this being rarer it makes it less likely). If the
data is being accessed between two separate IRQ handlers, a spinlock
must be used anyway.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html