Re: [PATCH v8 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-01-12 Thread Baoquan He
On 01/12/15 at 04:00pm, Li, ZhenHua wrote:
 Comparing to v7, this version adds only a few lines code:
 
 In function copy_page_table,
 
 + __iommu_flush_cache(iommu, phys_to_virt(dma_pte_next),
 + VTD_PAGE_SIZE);

So this adding fixs the reported dmar fault on Takao's system, right?

 
 
 On 01/12/2015 03:06 PM, Li, Zhen-Hua wrote:
 This patchset is an update of Bill Sumner's patchset, implements a fix for:
 If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
 when a panic happens, the kdump kernel will boot with these faults:
 
  dmar: DRHD: handling fault status reg 102
  dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8
  DMAR:[fault reason 01] Present bit in root entry is clear
 
  dmar: DRHD: handling fault status reg 2
  dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
  INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
 
 On some system, the interrupt remapping fault will also happen even if the
 intel_iommu is not set to on, because the interrupt remapping will be enabled
 when x2apic is needed by the system.
 
 The cause of the DMA fault is described in Bill's original version, and the
 INTR-Remap fault is caused by a similar reason. In short, the initialization
 of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
 response.
 
 To fix this problem, we modifies the behaviors of the intel vt-d in the
 crashdump kernel:
 
 For DMA Remapping:
 1. To accept the vt-d hardware in an active state,
 2. Do not disable and re-enable the translation, keep it enabled.
 3. Use the old root entry table, do not rewrite the RTA register.
 4. Malloc and use new context entry table and page table, copy data from the
 old ones that used by the old kernel.
 5. to use different portions of the iova address ranges for the device 
 drivers
 in the crashdump kernel than the iova ranges that were in-use at the time
 of the panic.
 6. After device driver is loaded, when it issues the first dma_map command,
 free the dmar_domain structure for this device, and generate a new one, 
  so
 that the device can be assigned a new and empty page table.
 7. When a new context entry table is generated, we also save its address to
 the old root entry table.
 
 For Interrupt Remapping:
 1. To accept the vt-d hardware in an active state,
 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
 4. When ioapic entry is setup, the interrupt remapping table is changed, and
 the updated data will be stored to the old interrupt remapping table.
 
 Advantages of this approach:
 1. All manipulation of the IO-device is done by the Linux device-driver
 for that device.
 2. This approach behaves in a manner very similar to operation without an
 active iommu.
 3. Any activity between the IO-device and its RMRR areas is handled by the
 device-driver in the same manner as during a non-kdump boot.
 4. If an IO-device has no driver in the kdump kernel, it is simply left 
 alone.
 This supports the practice of creating a special kdump kernel without
 drivers for any devices that are not required for taking a crashdump.
 5. Minimal code-changes among the existing mainline intel vt-d code.
 
 Summary of changes in this patch set:
 1. Added some useful function for root entry table in code intel-iommu.c
 2. Added new members to struct root_entry and struct irte;
 3. Functions to load old root entry table to iommu-root_entry from the 
 memory
 of old kernel.
 4. Functions to malloc new context entry table and page table and copy the 
 data
 from the old ones to the malloced new ones.
 5. Functions to enable support for DMA remapping in kdump kernel.
 6. Functions to load old irte data from the old kernel to the kdump kernel.
 7. Some code changes that support other behaviours that have been listed.
 8. In the new functions, use physical address as unsigned long type, not
 pointers.
 
 Original version by Bill Sumner:
  https://lkml.org/lkml/2014/1/10/518
  https://lkml.org/lkml/2014/4/15/716
  https://lkml.org/lkml/2014/4/24/836
 
 Zhenhua's updates:
  https://lkml.org/lkml/2014/10/21/134
  https://lkml.org/lkml/2014/12/15/121
  https://lkml.org/lkml/2014/12/22/53
  https://lkml.org/lkml/2015/1/6/1166
 
 Changelog[v8]:
  1. Add a missing __iommu_flush_cache in function copy_page_table.
 
 Changelog[v7]:
  1. Use __iommu_flush_cache to flush the data to hardware.
 
 Changelog[v6]:
  1. Use unsigned long as type of physical address.
  2. Use new function unmap_device_dma to unmap the old dma.
  3. Some small incorrect bits order for aw shift.
 
 Changelog[v5]:
  1. Do not disable and re-enable traslation and interrupt remapping.
  2. Use old root entry table.
  3. Use old interrupt remapping 

Re: [PATCH v8 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-01-12 Thread Li, ZhenHua

On 01/12/2015 05:07 PM, Baoquan He wrote:

On 01/12/15 at 04:00pm, Li, ZhenHua wrote:

Comparing to v7, this version adds only a few lines code:

In function copy_page_table,

+   __iommu_flush_cache(iommu, phys_to_virt(dma_pte_next),
+   VTD_PAGE_SIZE);


So this adding fixs the reported dmar fault on Takao's system, right?

I am not sure whether it can fix the dmar fault on Takao's system, but
I hope it can fix.






On 01/12/2015 03:06 PM, Li, Zhen-Hua wrote:

This patchset is an update of Bill Sumner's patchset, implements a fix for:
If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
when a panic happens, the kdump kernel will boot with these faults:

 dmar: DRHD: handling fault status reg 102
 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8
 DMAR:[fault reason 01] Present bit in root entry is clear

 dmar: DRHD: handling fault status reg 2
 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

On some system, the interrupt remapping fault will also happen even if the
intel_iommu is not set to on, because the interrupt remapping will be enabled
when x2apic is needed by the system.

The cause of the DMA fault is described in Bill's original version, and the
INTR-Remap fault is caused by a similar reason. In short, the initialization
of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
response.

To fix this problem, we modifies the behaviors of the intel vt-d in the
crashdump kernel:

For DMA Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the translation, keep it enabled.
3. Use the old root entry table, do not rewrite the RTA register.
4. Malloc and use new context entry table and page table, copy data from the
old ones that used by the old kernel.
5. to use different portions of the iova address ranges for the device drivers
in the crashdump kernel than the iova ranges that were in-use at the time
of the panic.
6. After device driver is loaded, when it issues the first dma_map command,
free the dmar_domain structure for this device, and generate a new one, so
that the device can be assigned a new and empty page table.
7. When a new context entry table is generated, we also save its address to
the old root entry table.

For Interrupt Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the interrupt remapping, keep it enabled.
3. Use the old interrupt remapping table, do not rewrite the IRTA register.
4. When ioapic entry is setup, the interrupt remapping table is changed, and
the updated data will be stored to the old interrupt remapping table.

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
for that device.
2. This approach behaves in a manner very similar to operation without an
active iommu.
3. Any activity between the IO-device and its RMRR areas is handled by the
device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
This supports the practice of creating a special kdump kernel without
drivers for any devices that are not required for taking a crashdump.
5. Minimal code-changes among the existing mainline intel vt-d code.

Summary of changes in this patch set:
1. Added some useful function for root entry table in code intel-iommu.c
2. Added new members to struct root_entry and struct irte;
3. Functions to load old root entry table to iommu-root_entry from the memory
of old kernel.
4. Functions to malloc new context entry table and page table and copy the data
from the old ones to the malloced new ones.
5. Functions to enable support for DMA remapping in kdump kernel.
6. Functions to load old irte data from the old kernel to the kdump kernel.
7. Some code changes that support other behaviours that have been listed.
8. In the new functions, use physical address as unsigned long type, not
pointers.

Original version by Bill Sumner:
 https://lkml.org/lkml/2014/1/10/518
 https://lkml.org/lkml/2014/4/15/716
 https://lkml.org/lkml/2014/4/24/836

Zhenhua's updates:
 https://lkml.org/lkml/2014/10/21/134
 https://lkml.org/lkml/2014/12/15/121
 https://lkml.org/lkml/2014/12/22/53
 https://lkml.org/lkml/2015/1/6/1166

Changelog[v8]:
 1. Add a missing __iommu_flush_cache in function copy_page_table.

Changelog[v7]:
 1. Use __iommu_flush_cache to flush the data to hardware.

Changelog[v6]:
 1. Use unsigned long as type of physical address.
 2. Use new function unmap_device_dma to unmap the old dma.
 3. Some small incorrect bits order for aw shift.

Changelog[v5]:
 1. Do not disable and re-enable traslation and interrupt remapping.
 2. Use old root entry table.
 3. 

Re: [PATCH v8 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-01-12 Thread Li, ZhenHua

Comparing to v7, this version adds only a few lines code:

In function copy_page_table,

+   __iommu_flush_cache(iommu, phys_to_virt(dma_pte_next),
+   VTD_PAGE_SIZE);


On 01/12/2015 03:06 PM, Li, Zhen-Hua wrote:

This patchset is an update of Bill Sumner's patchset, implements a fix for:
If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
when a panic happens, the kdump kernel will boot with these faults:

 dmar: DRHD: handling fault status reg 102
 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8
 DMAR:[fault reason 01] Present bit in root entry is clear

 dmar: DRHD: handling fault status reg 2
 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

On some system, the interrupt remapping fault will also happen even if the
intel_iommu is not set to on, because the interrupt remapping will be enabled
when x2apic is needed by the system.

The cause of the DMA fault is described in Bill's original version, and the
INTR-Remap fault is caused by a similar reason. In short, the initialization
of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
response.

To fix this problem, we modifies the behaviors of the intel vt-d in the
crashdump kernel:

For DMA Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the translation, keep it enabled.
3. Use the old root entry table, do not rewrite the RTA register.
4. Malloc and use new context entry table and page table, copy data from the
old ones that used by the old kernel.
5. to use different portions of the iova address ranges for the device drivers
in the crashdump kernel than the iova ranges that were in-use at the time
of the panic.
6. After device driver is loaded, when it issues the first dma_map command,
free the dmar_domain structure for this device, and generate a new one, so
that the device can be assigned a new and empty page table.
7. When a new context entry table is generated, we also save its address to
the old root entry table.

For Interrupt Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the interrupt remapping, keep it enabled.
3. Use the old interrupt remapping table, do not rewrite the IRTA register.
4. When ioapic entry is setup, the interrupt remapping table is changed, and
the updated data will be stored to the old interrupt remapping table.

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
for that device.
2. This approach behaves in a manner very similar to operation without an
active iommu.
3. Any activity between the IO-device and its RMRR areas is handled by the
device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
This supports the practice of creating a special kdump kernel without
drivers for any devices that are not required for taking a crashdump.
5. Minimal code-changes among the existing mainline intel vt-d code.

Summary of changes in this patch set:
1. Added some useful function for root entry table in code intel-iommu.c
2. Added new members to struct root_entry and struct irte;
3. Functions to load old root entry table to iommu-root_entry from the memory
of old kernel.
4. Functions to malloc new context entry table and page table and copy the data
from the old ones to the malloced new ones.
5. Functions to enable support for DMA remapping in kdump kernel.
6. Functions to load old irte data from the old kernel to the kdump kernel.
7. Some code changes that support other behaviours that have been listed.
8. In the new functions, use physical address as unsigned long type, not
pointers.

Original version by Bill Sumner:
 https://lkml.org/lkml/2014/1/10/518
 https://lkml.org/lkml/2014/4/15/716
 https://lkml.org/lkml/2014/4/24/836

Zhenhua's updates:
 https://lkml.org/lkml/2014/10/21/134
 https://lkml.org/lkml/2014/12/15/121
 https://lkml.org/lkml/2014/12/22/53
 https://lkml.org/lkml/2015/1/6/1166

Changelog[v8]:
 1. Add a missing __iommu_flush_cache in function copy_page_table.

Changelog[v7]:
 1. Use __iommu_flush_cache to flush the data to hardware.

Changelog[v6]:
 1. Use unsigned long as type of physical address.
 2. Use new function unmap_device_dma to unmap the old dma.
 3. Some small incorrect bits order for aw shift.

Changelog[v5]:
 1. Do not disable and re-enable traslation and interrupt remapping.
 2. Use old root entry table.
 3. Use old interrupt remapping table.
 4. New functions to copy data from old kernel, and save to old kernel mem.
 5. New functions to save updated root entry table and irte table.
 6. Use intel_unmap to unmap the old dma;
 7. Allocate new 

Re: [PATCH RESEND] dma-mapping: tidy up dma_parms default handling

2015-01-12 Thread Robin Murphy

On 09/01/15 19:45, Arnd Bergmann wrote:

On Friday 09 January 2015 16:56:03 Robin Murphy wrote:


This one's a bit tricky to find a home for - I think technically it's
probably an IOMMU patch, but then the long-underlying problem doesn't
seem to have blown up anything until arm64, and my motivation is to
make bits of Juno work, which seems to nudge it towards arm64/arm-soc
territory. Could anyone suggest which tree is most appropriate?


I have a set of patches touching various dma-mapping.h related bits
across architectures and in ARM in particular. Your patch fits into
that series, and I guess we could either have it in my asm-generic
tree or in Andrew Morton's mm tree. Possibly also arm-soc for practical
reasons, although it really doesn't belong in there.



Thanks Arnd, I'd agree asm-generic or mm sound the most sensible - If 
you're happy to carry this patch with your series that'd be really helpful.


Robin.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 01/10] iommu/vt-d: Update iommu_attach_domain() and its callers

2015-01-12 Thread Joerg Roedel
On Mon, Jan 12, 2015 at 03:06:19PM +0800, Li, Zhen-Hua wrote:
 Allow specification of the domain-id for the new domain.
 This patch only adds the 'did' parameter to iommu_attach_domain()
 and modifies all of its callers to specify the default value of -1
 which says no did specified, allocate a new one.

I think its better to keep the old iommu_attach_domain() interface in
place and introduce a new function (like iommu_attach_domain_with_id()
or something) which has the additional parameter. Then you can rewrite
iommu_attach_domain():

iommu_attach_domai(...)
{
return iommu_attach_domain_with_id(..., -1);
}

This way you don't have to update all the callers of
iommu_attach_domain() and the interface is more readable.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Joerg Roedel
On Mon, Jan 12, 2015 at 03:06:20PM +0800, Li, Zhen-Hua wrote:
 +
 +#ifdef CONFIG_CRASH_DUMP
 +
 +/*
 + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
 + *
 + * Fixes the crashdump kernel to deal with an active iommu and legacy
 + * DMA from the (old) panicked kernel in a manner similar to how legacy
 + * DMA is handled when no hardware iommu was in use by the old kernel --
 + * allow the legacy DMA to continue into its current buffers.
 + *
 + * In the crashdump kernel, this code:
 + * 1. skips disabling the IOMMU's translating of IO Virtual Addresses (IOVA).
 + * 2. Do not re-enable IOMMU's translating.
 + * 3. In kdump kernel, use the old root entry table.
 + * 4. Leaves the current translations in-place so that legacy DMA will
 + *continue to use its current buffers.
 + * 5. Allocates to the device drivers in the crashdump kernel
 + *portions of the iova address ranges that are different
 + *from the iova address ranges that were being used by the old kernel
 + *at the time of the panic.
 + *
 + */

It looks like you are still copying the io-page-tables from the old
kernel into the kdump kernel, is that right? With the approach that was
proposed you only need to copy over the context entries 1-1. They are
still pointing to the page-tables in the old kernels memory (which is
just fine).

The root-entry of the old kernel is also re-used, and when the kdump
kernel starts to use a device, its context entry is updated to point to
a newly allocated page-table.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Exynos IOMMU driver doesn't work?

2015-01-12 Thread Hongbo Zhang
On 9 January 2015 at 23:34, Javier Martinez Canillas
jav...@dowhile0.org wrote:
 [adding Marek, Sjoerd and Joonyoung that were discussing about iommu
 support in another thread]

Thank you Javier.


 Hello Hongbo,

 On Fri, Jan 9, 2015 at 8:31 AM, Hongbo Zhang hongbo.zh...@linaro.org wrote:
 Add linux-samsung-...@vger.kernel.org mailing list.

 On 7 January 2015 at 18:31, Hongbo Zhang hongbo.zh...@linaro.org wrote:
 Hi Cho KyongHo, Joerg et al,
 I found the latest Exynos IOMMU driver doesn't work, the line 481:
 BUG_ON(!has_sysmmu(dev));
 in function __exynos_sysmmu_enable() in file exynos-iommu.c triggers
 kernel panic.

 Then I found the dev-archdata.iommu isn't initialized at all, it
 should be the root cause.


 That's correct, I found the same the other day since and thought about
 posting a patch to return -ENODEV if !has_sysmmu(dev) instead to avoid
 the driver to panic the kernel. But then I realized this is already
 fixed in Marek's [PATCH v3 00/19] Exynos SYSMMU (IOMMU) integration
 with DT and DMA-mapping subsystem series [0].

 Another problem is this driver is added support of device tree, but
 there is no device tree nodes in the dts file, so I had to search from
 internet and added those nodes manually.

 I've found these links of v12 and v13 patches
 https://lkml.org/lkml/2014/4/27/171
 https://lkml.org/lkml/2014/5/12/34
 patch v13 was merged into mainline kernel, but as a part of v12, it
 isn't complete and doesn't work alone, eg dts nodes are missing.
 (I didn't research much dev-archdata.iommu initialization error is
 introduced by which patch, but it seems in very old codes there is no
 such problem)


 Yes, please take a look to Marek series [0]. Keep in mind that the
 series does not support all sysmmu revisions so IOMMU is not supported
 for some SoCs (e.g: Exynos5). Support for that is planned once that
 series land into mainline though [1].

 May I ask why are you interested in IOMMU support on Exynos? I'm
 asking because the reason why I tried to enable IOMMU support (and hit
 the same issue) was to try using the Exynos DRM HDMI driver with IOMMU
 since I found that HDMI is working on the downstream Samsung kernel
 [2] that has IOMMU support, but is not working on mainline.

Because I am testing vfio-platform patches, IOMMU is used in this case.
http://www.spinics.net/lists/kvm-arm/msg12445.html

And I am glad to find a working kernel as you pointed out, then I
found these two commits in this tree may solve my problem:
841a7fe TEMP/TO POST: iommu: exynos: Add mmu-masters support
bd7e4c7 TEMP/TO POST: ARM: dts: add System MMU nodes of Exynos SoCs


 At the end the HDMI problem seems to not be IOMMU related but
 something with the power domains and clocking but in case you are
 facing the same issue, you may be interested in that discussion [3].

 Best regards,
 Javier

 [0]: http://www.spinics.net/lists/linux-samsung-soc/msg39168.html
 [1]: http://www.spinics.net/lists/linux-samsung-soc/msg39980.html
 [2]: g...@github.com:exynos-reference/kernel.git
 [3]: http://www.spinics.net/lists/linux-samsung-soc/msg40828.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND] dma-mapping: tidy up dma_parms default handling

2015-01-12 Thread Will Deacon
On Fri, Jan 09, 2015 at 07:45:49PM +, Arnd Bergmann wrote:
 On Friday 09 January 2015 16:56:03 Robin Murphy wrote:
  
  This one's a bit tricky to find a home for - I think technically it's 
  probably an IOMMU patch, but then the long-underlying problem doesn't
  seem to have blown up anything until arm64, and my motivation is to
  make bits of Juno work, which seems to nudge it towards arm64/arm-soc
  territory. Could anyone suggest which tree is most appropriate?
 
 I have a set of patches touching various dma-mapping.h related bits
 across architectures and in ARM in particular. Your patch fits into
 that series, and I guess we could either have it in my asm-generic
 tree or in Andrew Morton's mm tree. Possibly also arm-soc for practical
 reasons, although it really doesn't belong in there.

I also have a couple of fixes for issues found by Laurent for tearing
down the IOMMU dma ops, so you could include those too.

I'll send them out this afternoon.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Exynos IOMMU driver doesn't work?

2015-01-12 Thread Javier Martinez Canillas
Hello Hongbo,

On Mon, Jan 12, 2015 at 11:51 AM, Hongbo Zhang hongbo.zh...@linaro.org wrote:
 On 9 January 2015 at 23:34, Javier Martinez Canillas

 Yes, please take a look to Marek series [0]. Keep in mind that the
 series does not support all sysmmu revisions so IOMMU is not supported
 for some SoCs (e.g: Exynos5). Support for that is planned once that
 series land into mainline though [1].

 May I ask why are you interested in IOMMU support on Exynos? I'm
 asking because the reason why I tried to enable IOMMU support (and hit
 the same issue) was to try using the Exynos DRM HDMI driver with IOMMU
 since I found that HDMI is working on the downstream Samsung kernel
 [2] that has IOMMU support, but is not working on mainline.

 Because I am testing vfio-platform patches, IOMMU is used in this case.
 http://www.spinics.net/lists/kvm-arm/msg12445.html


Ok, different use case then.

 And I am glad to find a working kernel as you pointed out, then I

Glad that you found the information useful.

 found these two commits in this tree may solve my problem:
 841a7fe TEMP/TO POST: iommu: exynos: Add mmu-masters support
 bd7e4c7 TEMP/TO POST: ARM: dts: add System MMU nodes of Exynos SoCs



Yes, that's what we cherry-picked as well to test HDMI since enabling
IOMMU has a side effect of turning on the right power domains and
enabling the needed clocks.

Best regards,
Javier
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Vivek Goyal
On Mon, Jan 12, 2015 at 04:22:08PM +0100, Joerg Roedel wrote:
 On Mon, Jan 12, 2015 at 03:06:20PM +0800, Li, Zhen-Hua wrote:
  +
  +#ifdef CONFIG_CRASH_DUMP
  +
  +/*
  + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
  + *
  + * Fixes the crashdump kernel to deal with an active iommu and legacy
  + * DMA from the (old) panicked kernel in a manner similar to how legacy
  + * DMA is handled when no hardware iommu was in use by the old kernel --
  + * allow the legacy DMA to continue into its current buffers.
  + *
  + * In the crashdump kernel, this code:
  + * 1. skips disabling the IOMMU's translating of IO Virtual Addresses 
  (IOVA).
  + * 2. Do not re-enable IOMMU's translating.
  + * 3. In kdump kernel, use the old root entry table.
  + * 4. Leaves the current translations in-place so that legacy DMA will
  + *continue to use its current buffers.
  + * 5. Allocates to the device drivers in the crashdump kernel
  + *portions of the iova address ranges that are different
  + *from the iova address ranges that were being used by the old kernel
  + *at the time of the panic.
  + *
  + */
 
 It looks like you are still copying the io-page-tables from the old
 kernel into the kdump kernel, is that right? With the approach that was
 proposed you only need to copy over the context entries 1-1. They are
 still pointing to the page-tables in the old kernels memory (which is
 just fine).

Kdump has the notion of backup region. Where certain parts of old kernels
memory can be moved to a different location (first 640K on x86 as of now)
and new kernel can make use of this memory now.

So we will have to just make sure that no parts of this old page table
fall into backup region.

Thanks
Vivek
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] PCI/x86: Interface for testing multivector MSI support

2015-01-12 Thread Alex Williamson
On Thu, 2015-01-08 at 09:15 -0700, Bjorn Helgaas wrote:
 On Fri, Nov 21, 2014 at 03:08:27PM -0700, Alex Williamson wrote:
  I'd like to make vfio-pci capable of manipulating the device exposed
  to the user such that if the host can only support a single MSI
  vector then we hide the fact that the device itself may actually be
  able to support more.  When we virtualize PCI config space and
  interrupt setup there's no PCI protocol for the device failing to
  allocate the number of vectors that it said were available.  If the
  userspace driver is a guest operating system, it certainly doesn't
  expect this to fail.  I don't think we can ever guarantee that a
  multi-vector request will succeed, but we can certainly guarantee
  that it will fail if the platform doesn't support it.
  
  An example device is the Atheros AR93xxx running in a Windows 7 VM.
  Both the device and the guest OS support multiple MSI vectors.  With
  interrupt remapping, such that the host supports multivector, the
  device works well in the guest.  With interrupt remapping disabled,
  the device is far less reliable because of the mismatch in MSI
  programming vs driver configuration and often fails.  If vfio-pci
  can test whether multiple vectors are supported, then we can make it
  work reliably in both cases by adjusting the exposed MSI capability,
  like in this patch that would follow this series:
  
  https://github.com/awilliam/linux-vfio/commit/9ace67515680
  
  With this series, only x86 w/ interrupt remapping will advertise
  support for multiple MSI vectors.  In surveying the code, I couldn't
  find any other archs that allowed it, but I'll take corrections if
  that's untrue.  Thanks,
 
 Per Thomas' comments and your possible workaround if we don't have
 pci_msi_supported(), I'm going to ignore these for now.  Let me know if
 you disagree.

Yep, that's fine.  I'll either forget about this for a while or kludge
something in vfio to know that only x86 with interrupt remapping, which
I can test from the IOMMU API, has multivector MSI support.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Track when amd_iommu_v2 init is complete

2015-01-12 Thread Joerg Roedel
Hi Oded,

On Mon, Dec 22, 2014 at 12:23:44PM +0200, Oded Gabbay wrote:
 The drm guys suggested we move iommu/ subsystem before gpu/
 subsystem in drivers/Makefile instead of the above patch (and the
 complementing patch-set in amdkfd).
 I did that and it works, so please see this patch as discarded for now.
 I will send a new patch-set shortly.

Yeah, this is still a hack, but a better solution than tracking the
initialization order manually.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/4] Genericise the IOVA allocator

2015-01-12 Thread Robin Murphy

Hi Joerg,

On 12/01/15 15:52, Joerg Roedel wrote:
[...]


Thanks for doing this, I like this patch-set.

I would also appreciate if someone from Intel could have a look at it,
David?

Besides, can you please re-post this patch-set rebased to latest
upstream with the better versions of patch 1 and 2, please?

I consider to apply these changes then.



Thanks! Funnily enough, that's on my things I didn't quite get round to 
yesterday list. I have this series rebased onto -rc3 with the comments 
addressed, and I'm in the middle of a final cleanup and check of the 
arm64 dma-mapping stuff on top of it. Expect to see both later today.


Robin.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/19] Exynos SYSMMU (IOMMU) integration with DT and DMA-mapping subsystem

2015-01-12 Thread Javier Martinez Canillas
Hello Joonyoung,

On 01/12/2015 07:40 AM, Joonyoung Shim wrote:
 And also making changes to the clocks in the clk-exynos5420 driver. Can
 you please explain the rationale for those changes? I'm asking because
 without your clock changes (only adding the DISP1 pd and making the
 devices as consumers), I've HDMI output too but video is even worse. This
 [0] is the minimal change I have on top of 3.19-rc3 to have some output.
 
 
 I just refer below patches,
 http://comments.gmane.org/gmane.linux.kernel.samsung-soc/34576
 
 But i'm not sure whether DISP1 power domain is same case with MFC power 
 domain.


Thanks a lot for sharing those patches, now your changes are much more
clear to me.

 
 So there seems to be two issues here, one is the mixer and hdmi modules not
 being attached to the DISP1 power domain and another one is the clocks setup
 not being correct to have proper HDMI video output.
  
 
 Hmm, i can see normal hdmi output still from latest upstream
 kernel(3.19-rc4) with my kernel changes and u-boot changes(DISP1 power
 domain disable) of prior mail on odroid xu3 board.


I thought you said on another email that after commit 2ed127697eb1 which
landed on 3.19-rc1 you had bad HDMI output?

In your changes, it was missing the SW_ACLK_400_DISP1 and USER_ACLK_400_DISP1
clock mux outputs that goes to internal buses in the DISP1. Adding IDs for
these in the exynos5420 clock driver and to the parent and input clock paris
list in the DISP1 power domain gives me a good HDMI output on 3.19-rc2.

Also, the SW_ACLK_300_DISP1 and USER_ACLK_300_DISP1 are needed for the FIMD
parent and input clock respectively. Adding those to the clocks list of the
DISP1 power domain gives me working display + HDMI on my Exynos5800 Peach Pi.

These are the changes I have now [0]. Please let me know what you think.

 
 I didn't have this issue when testing your patch against 3.19-rc2. From your
 log I see that you are testing on a 3.18.1. So maybe makes sense to test with
 the latest kernel version since this HDMI issue qualifies as an 3.19-rc fix?
 
 Since commit 2ed127697eb1 (PM / Domains: Power on the PM domain right after 
 attach completes)
 that landed in 3.19-rc1, I see that the power domain is powered on when a
 device is attached. So maybe that is what makes a difference here?
 

 I'm not sure, but i get same error results from 3.19-rc4. Did you test
 using exynos drm driver? I used modetest of libdrm


Yes, I was not able to trigger that by running modetest but by turning off
my HDMI monitor and then turning it on again. When the monitor is turned
on then I see a Power domain power-domain disable failed and the imprecise
external abort error.

I had to disable CONFIG_DRM_EXYNOS_DP in order to trigger though and that
is why I was not able to reproduce it before.

I think though that this is a separate issue of the HDMI not working since
power domains should be able to have many consumers devices and I see that
other power domains are used that way.

Best regards,
Javier

[0]:
diff --git a/arch/arm/boot/dts/exynos5420.dtsi 
b/arch/arm/boot/dts/exynos5420.dtsi
index 0ac5e0810e97..53b0a03843f2 100644
--- a/arch/arm/boot/dts/exynos5420.dtsi
+++ b/arch/arm/boot/dts/exynos5420.dtsi
@@ -270,6 +270,19 @@
reg = 0x10044120 0x20;
};
 
+   disp1_pd: power-domain@100440C0 {
+   compatible = samsung,exynos4210-pd;
+   reg = 0x100440C0 0x20;
+   clocks = clock CLK_FIN_PLL, clock CLK_MOUT_SW_ACLK200,
+   clock CLK_MOUT_USER_ACLK200_DISP1,
+   clock CLK_MOUT_SW_ACLK300,
+   clock CLK_MOUT_USER_ACLK300_DISP1,
+   clock CLK_MOUT_SW_ACLK400,
+   clock CLK_MOUT_USER_ACLK400_DISP1;
+   clock-names = oscclk, pclk0, clk0,
+ pclk1, clk1, pclk2, clk2;
+   };
+
pinctrl_0: pinctrl@1340 {
compatible = samsung,exynos5420-pinctrl;
reg = 0x1340 0x1000;
@@ -537,6 +550,7 @@
fimd: fimd@1440 {
clocks = clock CLK_SCLK_FIMD1, clock CLK_FIMD1;
clock-names = sclk_fimd, fimd;
+   samsung,power-domain = disp1_pd;
};
 
adc: adc@12D1 {
@@ -710,6 +724,7 @@
phy = hdmiphy;
samsung,syscon-phandle = pmu_system_controller;
status = disabled;
+   samsung,power-domain = disp1_pd;
};
 
hdmiphy: hdmiphy@145D {
@@ -722,6 +737,7 @@
interrupts = 0 94 0;
clocks = clock CLK_MIXER, clock CLK_SCLK_HDMI;
clock-names = mixer, sclk_hdmi;
+   samsung,power-domain = disp1_pd;
};
 
gsc_0: video-scaler@13e0 {
diff --git a/drivers/clk/samsung/clk-exynos5420.c 
b/drivers/clk/samsung/clk-exynos5420.c
index 848d602efc06..07d666cc6a29 100644
--- a/drivers/clk/samsung/clk-exynos5420.c
+++ 

Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Vivek Goyal
On Mon, Jan 12, 2015 at 05:06:46PM +0100, Joerg Roedel wrote:
 On Mon, Jan 12, 2015 at 10:29:19AM -0500, Vivek Goyal wrote:
  Kdump has the notion of backup region. Where certain parts of old kernels
  memory can be moved to a different location (first 640K on x86 as of now)
  and new kernel can make use of this memory now.
  
  So we will have to just make sure that no parts of this old page table
  fall into backup region.
 
 Uuh, looks like the 'iommu-with-kdump-issue' isn't complicated enough
 yet ;)
 Sadly, your above statement is true for all hardware-accessible data
 structures in IOMMU code. I think about how we can solve this, is there
 an easy way to allocate memory that is not in any backup region?

Hmm..., there does not seem to be any easy way to do this. In fact, as of
now, kernel does not even know where is backup region. All these details are
managed by user space completely (except for new kexec_file_load() syscall).

That means we are left with ugly options now.

- Define per arch kexec backup regions in kernel and export it to user
  space and let kexec-tools make use of that deinition (instead of
  defining its own). That way memory allocation code in kernel can look
  at this backup area and skip it for certain allocations.

Thanks
Vivek
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 5/5] arm64: hook up IOMMU dma_ops

2015-01-12 Thread Robin Murphy
With iommu_dma_ops in place, hook them up to the configuration code, so
IOMMU-fronted devices will get them automatically.

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
 arch/arm64/Kconfig   |  1 +
 arch/arm64/include/asm/dma-mapping.h | 10 +-
 arch/arm64/mm/dma-mapping.c  | 22 ++
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1f9a20..e2abcdc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -66,6 +66,7 @@ config ARM64
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
select HAVE_SYSCALL_TRACEPOINTS
+   select IOMMU_DMA if IOMMU_SUPPORT
select IRQ_DOMAIN
select MODULES_USE_ELF_RELA
select NO_BOOTMEM
diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index 82082c4..0791a78 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -45,13 +45,13 @@ static inline struct dma_map_ops *get_dma_ops(struct device 
*dev)
return __generic_dma_ops(dev);
 }
 
-static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 
size,
- struct iommu_ops *iommu, bool coherent)
-{
-   dev-archdata.dma_coherent = coherent;
-}
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+   struct iommu_ops *iommu, bool coherent);
 #define arch_setup_dma_ops arch_setup_dma_ops
 
+void arch_teardown_dma_ops(struct device *dev);
+#define arch_teardown_dma_ops  arch_teardown_dma_ops
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 8e449a7..d52175d 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -729,10 +729,32 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size,
iommu_dma_release_mapping(mapping);
 }
 
+static void __iommu_teardown_dma_ops(struct device *dev)
+{
+   if (dev-archdata.mapping) {
+   iommu_dma_detach_device(dev);
+   dev-archdata.dma_ops = NULL;
+   }
+}
+
 #else
 
 static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
  struct iommu_ops *iommu)
 { }
 
+static void __iommu_teardown_dma_ops(struct device *dev) { }
+
 #endif  /* CONFIG_IOMMU_DMA */
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+   struct iommu_ops *iommu, bool coherent)
+{
+   dev-archdata.dma_coherent = coherent;
+   __iommu_setup_dma_ops(dev, dma_base, size, iommu);
+}
+
+void arch_teardown_dma_ops(struct device *dev)
+{
+   __iommu_teardown_dma_ops(dev);
+}
-- 
1.9.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 3/5] iommu: implement common IOMMU ops for DMA mapping

2015-01-12 Thread Robin Murphy
Taking inspiration from the existing arch/arm code, break out some
generic functions to interface the DMA-API to the IOMMU-API. This will
do the bulk of the heavy lifting for IOMMU-backed dma-mapping.

Whilst the target is arm64, rather than introduce yet another private
implementation, place this in common code as the first step towards
consolidating the numerous versions spread around between architecture
code and IOMMU drivers.

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
 include/linux/dma-iommu.h |  78 
 lib/Kconfig   |   8 +
 lib/Makefile  |   1 +
 lib/dma-iommu.c   | 455 ++
 4 files changed, 542 insertions(+)
 create mode 100644 include/linux/dma-iommu.h
 create mode 100644 lib/dma-iommu.c

diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
new file mode 100644
index 000..4515407
--- /dev/null
+++ b/include/linux/dma-iommu.h
@@ -0,0 +1,78 @@
+/*
+ * Copyright (C) 2014 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/.
+ */
+#ifndef __DMA_IOMMU_H
+#define __DMA_IOMMU_H
+
+#ifdef __KERNEL__
+
+#include linux/types.h
+#include linux/iommu.h
+
+#ifdef CONFIG_IOMMU_DMA
+
+int iommu_dma_init(void);
+
+struct iommu_dma_mapping *iommu_dma_create_mapping(struct iommu_ops *ops,
+   dma_addr_t base, size_t size);
+void iommu_dma_release_mapping(struct iommu_dma_mapping *mapping);
+
+dma_addr_t iommu_dma_create_iova_mapping(struct device *dev,
+   struct page **pages, size_t size, bool coherent);
+int iommu_dma_release_iova_mapping(struct device *dev, dma_addr_t iova,
+   size_t size);
+
+struct page **iommu_dma_alloc_buffer(struct device *dev, size_t size,
+   gfp_t gfp, struct dma_attrs *attrs,
+   void (clear_buffer)(struct page *page, size_t size));
+int iommu_dma_free_buffer(struct device *dev, struct page **pages, size_t size,
+   struct dma_attrs *attrs);
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+   unsigned long offset, size_t size, enum dma_data_direction dir,
+   struct dma_attrs *attrs);
+dma_addr_t iommu_dma_coherent_map_page(struct device *dev, struct page *page,
+   unsigned long offset, size_t size, enum dma_data_direction dir,
+   struct dma_attrs *attrs);
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+   enum dma_data_direction dir, struct dma_attrs *attrs);
+
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+   enum dma_data_direction dir, struct dma_attrs *attrs);
+int iommu_dma_coherent_map_sg(struct device *dev, struct scatterlist *sg,
+   int nents, enum dma_data_direction dir,
+   struct dma_attrs *attrs);
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sgl, int nents,
+   enum dma_data_direction dir, struct dma_attrs *attrs);
+
+int iommu_dma_attach_device(struct device *dev, struct iommu_dma_mapping 
*mapping);
+void iommu_dma_detach_device(struct device *dev);
+
+int iommu_dma_supported(struct device *hwdev, u64 mask);
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
+phys_addr_t iova_to_phys(struct device *dev, dma_addr_t dev_addr);
+
+#else
+
+static inline phys_addr_t iova_to_phys(struct device *dev, dma_addr_t dev_addr)
+{
+   return 0;
+}
+
+#endif  /* CONFIG_IOMMU_DMA */
+
+#endif /* __KERNEL__ */
+#endif /* __DMA_IOMMU_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index 54cf309..965d027 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -518,4 +518,12 @@ source lib/fonts/Kconfig
 config ARCH_HAS_SG_CHAIN
def_bool n
 
+#
+# IOMMU-agnostic DMA-mapping layer
+#
+config IOMMU_DMA
+   def_bool n
+   depends on IOMMU_SUPPORT  ARCH_HAS_SG_CHAIN  NEED_SG_DMA_LENGTH
+   select IOMMU_IOVA
+
 endmenu
diff --git a/lib/Makefile b/lib/Makefile
index 3c3b30b..e4b6134 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -103,6 +103,7 @@ obj-$(CONFIG_AUDIT_COMPAT_GENERIC) += compat_audit.o
 
 obj-$(CONFIG_SWIOTLB) += swiotlb.o
 obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
 obj-$(CONFIG_NOTIFIER_ERROR_INJECTION) += notifier-error-inject.o
 obj-$(CONFIG_CPU_NOTIFIER_ERROR_INJECT) += cpu-notifier-error-inject.o
diff --git a/lib/dma-iommu.c 

[RFC PATCH 1/5] arm64: Combine coherent and non-coherent swiotlb dma_ops

2015-01-12 Thread Robin Murphy
From: Catalin Marinas catalin.mari...@arm.com

Since dev_archdata now has a dma_coherent state, combine the two
coherent and non-coherent operations and remove their declaration,
together with set_dma_ops, from the arch dma-mapping.h file.

Signed-off-by: Catalin Marinas catalin.mari...@arm.com
---
 arch/arm64/include/asm/dma-mapping.h |  11 +---
 arch/arm64/mm/dma-mapping.c  | 116 ---
 2 files changed, 54 insertions(+), 73 deletions(-)

diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index 9ce3e68..6932bb5 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -28,8 +28,6 @@
 
 #define DMA_ERROR_CODE (~(dma_addr_t)0)
 extern struct dma_map_ops *dma_ops;
-extern struct dma_map_ops coherent_swiotlb_dma_ops;
-extern struct dma_map_ops noncoherent_swiotlb_dma_ops;
 
 static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
 {
@@ -47,23 +45,18 @@ static inline struct dma_map_ops *get_dma_ops(struct device 
*dev)
return __generic_dma_ops(dev);
 }
 
-static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
-{
-   dev-archdata.dma_ops = ops;
-}
-
 static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 
size,
  struct iommu_ops *iommu, bool coherent)
 {
dev-archdata.dma_coherent = coherent;
-   if (coherent)
-   set_dma_ops(dev, coherent_swiotlb_dma_ops);
 }
 #define arch_setup_dma_ops arch_setup_dma_ops
 
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
+   if (!dev)
+   return false;
return dev-archdata.dma_coherent;
 }
 
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index d920942..0a24b9b 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -134,16 +134,17 @@ static void __dma_free_coherent(struct device *dev, 
size_t size,
swiotlb_free_coherent(dev, size, vaddr, dma_handle);
 }
 
-static void *__dma_alloc_noncoherent(struct device *dev, size_t size,
-dma_addr_t *dma_handle, gfp_t flags,
-struct dma_attrs *attrs)
+static void *__dma_alloc(struct device *dev, size_t size,
+dma_addr_t *dma_handle, gfp_t flags,
+struct dma_attrs *attrs)
 {
struct page *page;
void *ptr, *coherent_ptr;
+   bool coherent = is_device_dma_coherent(dev);
 
size = PAGE_ALIGN(size);
 
-   if (!(flags  __GFP_WAIT)) {
+   if (!coherent  !(flags  __GFP_WAIT)) {
struct page *page = NULL;
void *addr = __alloc_from_pool(size, page);
 
@@ -151,13 +152,16 @@ static void *__dma_alloc_noncoherent(struct device *dev, 
size_t size,
*dma_handle = phys_to_dma(dev, page_to_phys(page));
 
return addr;
-
}
 
ptr = __dma_alloc_coherent(dev, size, dma_handle, flags, attrs);
if (!ptr)
goto no_mem;
 
+   /* no need for non-cacheable mapping if coherent */
+   if (coherent)
+   return ptr;
+
/* remove any dirty cache lines on the kernel alias */
__dma_flush_range(ptr, ptr + size);
 
@@ -179,15 +183,17 @@ no_mem:
return NULL;
 }
 
-static void __dma_free_noncoherent(struct device *dev, size_t size,
-  void *vaddr, dma_addr_t dma_handle,
-  struct dma_attrs *attrs)
+static void __dma_free(struct device *dev, size_t size,
+  void *vaddr, dma_addr_t dma_handle,
+  struct dma_attrs *attrs)
 {
void *swiotlb_addr = phys_to_virt(dma_to_phys(dev, dma_handle));
 
-   if (__free_from_pool(vaddr, size))
-   return;
-   vunmap(vaddr);
+   if (!is_device_dma_coherent(dev)) {
+   if (__free_from_pool(vaddr, size))
+   return;
+   vunmap(vaddr);
+   }
__dma_free_coherent(dev, size, swiotlb_addr, dma_handle, attrs);
 }
 
@@ -199,7 +205,8 @@ static dma_addr_t __swiotlb_map_page(struct device *dev, 
struct page *page,
dma_addr_t dev_addr;
 
dev_addr = swiotlb_map_page(dev, page, offset, size, dir, attrs);
-   __dma_map_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
+   if (!is_device_dma_coherent(dev))
+   __dma_map_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, 
dir);
 
return dev_addr;
 }
@@ -209,7 +216,8 @@ static void __swiotlb_unmap_page(struct device *dev, 
dma_addr_t dev_addr,
 size_t size, enum dma_data_direction dir,
 struct dma_attrs *attrs)
 {
-   __dma_unmap_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
+   if 

[RFC PATCH 2/5] arm64: implement generic IOMMU configuration

2015-01-12 Thread Robin Murphy
Add the necessary call to of_iommu_init.

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
 arch/arm64/kernel/setup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index b809911..8304141 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -40,6 +40,7 @@
 #include linux/fs.h
 #include linux/proc_fs.h
 #include linux/memblock.h
+#include linux/of_iommu.h
 #include linux/of_fdt.h
 #include linux/of_platform.h
 #include linux/efi.h
@@ -424,6 +425,7 @@ void __init setup_arch(char **cmdline_p)
 
 static int __init arm64_device_init(void)
 {
+   of_iommu_init();
of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
return 0;
 }
-- 
1.9.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 0/5] arm64: IOMMU-backed DMA mapping

2015-01-12 Thread Robin Murphy
Hi all,

Whilst it's a long way off perfect, this has reached the point of being
functional and stable enough to be useful, so here it is. The core
consists of the meat of the arch/arm implementation modified to remove
the assumption of PAGE_SIZE pages and ported over to the Intel IOVA
allocator instead of the bitmap-based one. For that, this series depends
on my Genericise the IOVA allocator series posted earlier[1].

There are plenty of obvious things still to do, including:

 * Domain and group handling is all wrong, but that's a bigger problem.
   For the moment it does more or less the same thing as the arch/arm
   code, which at least works for the one-IOMMU-per-device situation.
 * IOMMU domains and IOVA domains probably want to be better integrated
   with devices and each other, rather than having a proliferation of
   arch-specific structs.
 * The temporary map_sg implementation - I have a 'proper' iommu_map_sg
   based one in progress, but since the simple one works it's not been
   as high a priority.
 * Port arch/arm over to it. I'd guess it might be preferable to merge
   this through arm64 first, though, rather than overcomplicate matters.
 * There may well be scope for streamlining and tidying up the copied
   parts - In general I've simply avoided touching anything I don't
   fully understand.
 * In the same vein, I'm sure lots of it is fairly ARM-specific, so will
   need longer-term work to become truly generic.

[1]:http://thread.gmane.org/gmane.linux.kernel.iommu/8208

Catalin Marinas (1):
  arm64: Combine coherent and non-coherent swiotlb dma_ops

Robin Murphy (4):
  arm64: implement generic IOMMU configuration
  iommu: implement common IOMMU ops for DMA mapping
  arm64: add IOMMU dma_ops
  arm64: hook up IOMMU dma_ops

 arch/arm64/Kconfig   |   1 +
 arch/arm64/include/asm/device.h  |   3 +
 arch/arm64/include/asm/dma-mapping.h |  33 +--
 arch/arm64/kernel/setup.c|   2 +
 arch/arm64/mm/dma-mapping.c  | 435 -
 include/linux/dma-iommu.h|  78 ++
 lib/Kconfig  |   8 +
 lib/Makefile |   1 +
 lib/dma-iommu.c  | 455 +++
 9 files changed, 938 insertions(+), 78 deletions(-)
 create mode 100644 include/linux/dma-iommu.h
 create mode 100644 lib/dma-iommu.c

-- 
1.9.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 4/5] arm64: add IOMMU dma_ops

2015-01-12 Thread Robin Murphy
Taking some inspiration from the arch/arm code, implement the
arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
 arch/arm64/include/asm/device.h  |   3 +
 arch/arm64/include/asm/dma-mapping.h |  12 ++
 arch/arm64/mm/dma-mapping.c  | 297 +++
 3 files changed, 312 insertions(+)

diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 243ef25..c17f100 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,9 @@ struct dev_archdata {
struct dma_map_ops *dma_ops;
 #ifdef CONFIG_IOMMU_API
void *iommu;/* private IOMMU data */
+#ifdef CONFIG_IOMMU_DMA
+   struct iommu_dma_mapping *mapping;
+#endif
 #endif
bool dma_coherent;
 };
diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index 6932bb5..82082c4 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -64,11 +64,23 @@ static inline bool is_device_dma_coherent(struct device 
*dev)
 
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
+#ifdef CONFIG_IOMMU_DMA
+   /* We don't have an easy way of dealing with this... */
+   BUG_ON(dev-archdata.mapping);
+#endif
return (dma_addr_t)paddr;
 }
 
+#ifdef CONFIG_IOMMU_DMA
+phys_addr_t iova_to_phys(struct device *dev, dma_addr_t dev_addr);
+#endif
+
 static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
 {
+#ifdef CONFIG_IOMMU_DMA
+   if (dev-archdata.mapping)
+   return iova_to_phys(dev, dev_addr);
+#endif
return (phys_addr_t)dev_addr;
 }
 
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 0a24b9b..8e449a7 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -23,6 +23,7 @@
 #include linux/genalloc.h
 #include linux/dma-mapping.h
 #include linux/dma-contiguous.h
+#include linux/dma-iommu.h
 #include linux/vmalloc.h
 #include linux/swiotlb.h
 
@@ -426,6 +427,9 @@ static int __init arm64_dma_init(void)
 
ret |= swiotlb_late_init();
ret |= atomic_pool_init();
+#ifdef CONFIG_IOMMU_DMA
+   ret |= iommu_dma_init();
+#endif
 
return ret;
 }
@@ -439,3 +443,296 @@ static int __init dma_debug_do_init(void)
return 0;
 }
 fs_initcall(dma_debug_do_init);
+
+
+#ifdef CONFIG_IOMMU_DMA
+
+static struct page **__atomic_get_pages(void *addr)
+{
+   struct page *page;
+   phys_addr_t phys;
+
+   phys = gen_pool_virt_to_phys(atomic_pool, (unsigned long)addr);
+   page = phys_to_page(phys);
+
+   return (struct page **)page;
+}
+
+static struct page **__iommu_get_pages(void *cpu_addr, struct dma_attrs *attrs)
+{
+   struct vm_struct *area;
+
+   if (__in_atomic_pool(cpu_addr, PAGE_SIZE))
+   return __atomic_get_pages(cpu_addr);
+
+   area = find_vm_area(cpu_addr);
+   if (!area)
+   return NULL;
+
+   return area-pages;
+}
+
+static void *__iommu_alloc_atomic(struct device *dev, size_t size,
+ dma_addr_t *handle, bool coherent)
+{
+   struct page *page;
+   void *addr;
+
+   addr = __alloc_from_pool(size, page);
+   if (!addr)
+   return NULL;
+
+   *handle = iommu_dma_create_iova_mapping(dev, page, size, coherent);
+   if (*handle == DMA_ERROR_CODE) {
+   __free_from_pool(addr, size);
+   return NULL;
+   }
+   return addr;
+}
+
+static void __iommu_free_atomic(struct device *dev, void *cpu_addr,
+   dma_addr_t handle, size_t size)
+{
+   iommu_dma_release_iova_mapping(dev, handle, size);
+   __free_from_pool(cpu_addr, size);
+}
+
+static void __dma_clear_buffer(struct page *page, size_t size)
+{
+   void *ptr = page_address(page);
+
+   memset(ptr, 0, size);
+   __dma_flush_range(ptr, ptr + size);
+}
+
+static void *__iommu_alloc_attrs(struct device *dev, size_t size,
+   dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
+{
+   bool coherent = is_device_dma_coherent(dev);
+   pgprot_t prot = coherent ? __pgprot(PROT_NORMAL) :
+  __pgprot(PROT_NORMAL_NC);
+   struct page **pages;
+   void *addr = NULL;
+
+   *handle = DMA_ERROR_CODE;
+   size = PAGE_ALIGN(size);
+
+   if (!(gfp  __GFP_WAIT))
+   return __iommu_alloc_atomic(dev, size, handle, coherent);
+   /*
+* Following is a work-around (a.k.a. hack) to prevent pages
+* with __GFP_COMP being passed to split_page() which cannot
+* handle them.  The real problem is that this flag probably
+* should be 0 on ARM as it is not supported on this
+* platform; see CONFIG_HUGETLBFS.
+*/
+   gfp = ~(__GFP_COMP);
+
+   pages = iommu_dma_alloc_buffer(dev, size, 

RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2015-01-12 Thread Wu, Feng


 -Original Message-
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 Sent: Friday, January 09, 2015 10:56 PM
 To: Radim Krčmář; Wu, Feng
 Cc: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
 g...@kernel.org; dw...@infradead.org; j...@8bytes.org;
 alex.william...@redhat.com; jiang@linux.intel.com; eric.au...@linaro.org;
 linux-ker...@vger.kernel.org; iommu@lists.linux-foundation.org;
 k...@vger.kernel.org
 Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
 VT-d PI
 
 
 
 On 09/01/2015 15:54, Radim Krčmář wrote:
  There are two points relevant to this patch in new KVM's implementation,
  (KVM: x86: amend APIC lowest priority arbitration,
   https://lkml.org/lkml/2015/1/9/362)
 
  1) lowest priority depends on TPR
  2) there is no need for balancing
 
  (1) has to be considered with PI as well.
 
 The chipset doesn't support it. :(
 
  I kept (2) to avoid whining from people building on that behaviour, but
  lowest priority backed by PI could be transparent without it.
 
  Patch below removes the balancing, but I am not sure this is a price we
  allowed ourselves to pay ... what are your opinions?
 
 I wouldn't mind, but it requires a lot of benchmarking.

In fact, the real hardware may do lowest priority in round robin way, the new
hardware even doesn't consider the TPR for lowest priority interrupts delivery.

As discussed with Paolo before, I will submit a patch to support lowest 
priority for PI
after this series is merged.

Thanks,
Feng

 
 Paolo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 01/10] iommu/vt-d: Update iommu_attach_domain() and its callers

2015-01-12 Thread Li, ZhenHua

On 01/12/2015 11:18 PM, Joerg Roedel wrote:

On Mon, Jan 12, 2015 at 03:06:19PM +0800, Li, Zhen-Hua wrote:

Allow specification of the domain-id for the new domain.
This patch only adds the 'did' parameter to iommu_attach_domain()
and modifies all of its callers to specify the default value of -1
which says no did specified, allocate a new one.


I think its better to keep the old iommu_attach_domain() interface in
place and introduce a new function (like iommu_attach_domain_with_id()
or something) which has the additional parameter. Then you can rewrite
iommu_attach_domain():

iommu_attach_domai(...)
{
return iommu_attach_domain_with_id(..., -1);
}

This way you don't have to update all the callers of
iommu_attach_domain() and the interface is more readable.


Joerg



That's a good way. I will do this in next version.

Thanks
Zhenhua


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/19] Exynos SYSMMU (IOMMU) integration with DT and DMA-mapping subsystem

2015-01-12 Thread Joonyoung Shim
Hi,

On 01/13/2015 01:09 AM, Javier Martinez Canillas wrote:
 Hello Joonyoung,
 
 On 01/12/2015 07:40 AM, Joonyoung Shim wrote:
 And also making changes to the clocks in the clk-exynos5420 driver. Can
 you please explain the rationale for those changes? I'm asking because
 without your clock changes (only adding the DISP1 pd and making the
 devices as consumers), I've HDMI output too but video is even worse. This
 [0] is the minimal change I have on top of 3.19-rc3 to have some output.


 I just refer below patches,
 http://comments.gmane.org/gmane.linux.kernel.samsung-soc/34576

 But i'm not sure whether DISP1 power domain is same case with MFC power 
 domain.

 
 Thanks a lot for sharing those patches, now your changes are much more
 clear to me.
 

 So there seems to be two issues here, one is the mixer and hdmi modules not
 being attached to the DISP1 power domain and another one is the clocks setup
 not being correct to have proper HDMI video output.
  

 Hmm, i can see normal hdmi output still from latest upstream
 kernel(3.19-rc4) with my kernel changes and u-boot changes(DISP1 power
 domain disable) of prior mail on odroid xu3 board.

 
 I thought you said on another email that after commit 2ed127697eb1 which
 landed on 3.19-rc1 you had bad HDMI output?
 
 In your changes, it was missing the SW_ACLK_400_DISP1 and USER_ACLK_400_DISP1
 clock mux outputs that goes to internal buses in the DISP1. Adding IDs for
 these in the exynos5420 clock driver and to the parent and input clock paris
 list in the DISP1 power domain gives me a good HDMI output on 3.19-rc2.
 
 Also, the SW_ACLK_300_DISP1 and USER_ACLK_300_DISP1 are needed for the FIMD
 parent and input clock respectively. Adding those to the clocks list of the
 DISP1 power domain gives me working display + HDMI on my Exynos5800 Peach Pi.
 
 These are the changes I have now [0]. Please let me know what you think.
 

Good, it's working with your patch without u-boot changes and reverting
of commit 2ed127697eb.


 I didn't have this issue when testing your patch against 3.19-rc2. From your
 log I see that you are testing on a 3.18.1. So maybe makes sense to test 
 with
 the latest kernel version since this HDMI issue qualifies as an 3.19-rc fix?

 Since commit 2ed127697eb1 (PM / Domains: Power on the PM domain right 
 after attach completes)
 that landed in 3.19-rc1, I see that the power domain is powered on when a
 device is attached. So maybe that is what makes a difference here?


 I'm not sure, but i get same error results from 3.19-rc4. Did you test
 using exynos drm driver? I used modetest of libdrm

 
 Yes, I was not able to trigger that by running modetest but by turning off
 my HDMI monitor and then turning it on again. When the monitor is turned
 on then I see a Power domain power-domain disable failed and the imprecise
 external abort error.
 
 I had to disable CONFIG_DRM_EXYNOS_DP in order to trigger though and that
 is why I was not able to reproduce it before.
 
 I think though that this is a separate issue of the HDMI not working since
 power domains should be able to have many consumers devices and I see that
 other power domains are used that way.
 

OK, we need more investigation.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Joerg Roedel
On Mon, Jan 12, 2015 at 10:29:19AM -0500, Vivek Goyal wrote:
 Kdump has the notion of backup region. Where certain parts of old kernels
 memory can be moved to a different location (first 640K on x86 as of now)
 and new kernel can make use of this memory now.
 
 So we will have to just make sure that no parts of this old page table
 fall into backup region.

Uuh, looks like the 'iommu-with-kdump-issue' isn't complicated enough
yet ;)
Sadly, your above statement is true for all hardware-accessible data
structures in IOMMU code. I think about how we can solve this, is there
an easy way to allocate memory that is not in any backup region?

Thanks,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/4] Genericise the IOVA allocator

2015-01-12 Thread Joerg Roedel
Hi Robin,

On Tue, Nov 25, 2014 at 05:27:24PM +, Robin Murphy wrote:
 Hi all,
 
 I've been implementing IOMMU DMA mapping for arm64, based on tidied-up
 parts of the existing arch/arm/mm/dma-mapping.c with a clear divide
 between the arch-specific parts and the general DMA-API to IOMMU-API layer
 so that that can be shared; similar to what Ritesh started before and was
 unable to complete[1], but working in the other direction.
 
 The first part of that tidy-up involved ripping out the homebrewed IOVA
 allocator and plumbing in iova.c, necessitating the changes presented here.
 The rest is currently sat under arch/arm64 for the sake of getting it
 working quickly with minimal impact - ideally I'd move it out and port
 arch/arm before merging, but I don't know quite how impatient people are.
 Regardless of that decision, this bit stands alone, so here it is.
 
 Feel free to ignore patches 1 and 2, since I see Sakari has recently
 posted a more thorough series for that[2], that frankly looks nicer ;)
 I've merely left them in as context here.
 
 [1]:http://thread.gmane.org/gmane.linux.ports.arm.kernel/331299
 [2]:http://article.gmane.org/gmane.linux.kernel.iommu/7436
 
 Robin Murphy (4):
   iommu: build iova.c for any IOMMU
   iommu: consolidate IOVA allocator code
   iommu: make IOVA domain low limit flexible
   iommu: make IOVA domain page size explicit

Thanks for doing this, I like this patch-set.

I would also appreciate if someone from Intel could have a look at it,
David?

Besides, can you please re-post this patch-set rebased to latest
upstream with the better versions of patch 1 and 2, please?

I consider to apply these changes then.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] iommu: allow building iova.c independently

2015-01-12 Thread Robin Murphy
In preparation for sharing the IOVA allocator, split it out under its
own Kconfig symbol.

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
 drivers/iommu/Kconfig  | 4 
 drivers/iommu/Makefile | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 325188e..a839ca9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -13,6 +13,9 @@ menuconfig IOMMU_SUPPORT
 
 if IOMMU_SUPPORT
 
+config IOMMU_IOVA
+   bool
+
 config OF_IOMMU
def_bool y
depends on OF  IOMMU_API
@@ -91,6 +94,7 @@ config INTEL_IOMMU
bool Support for Intel IOMMU using DMA Remapping Devices
depends on PCI_MSI  ACPI  (X86 || IA64_GENERIC)
select IOMMU_API
+   select IOMMU_IOVA
select DMAR_TABLE
help
  DMA remapping (DMAR) devices support enables independent address
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 7b976f2..0b1b94e 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,13 +1,14 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU) += of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o msm_iommu_dev.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
-obj-$(CONFIG_INTEL_IOMMU) += iova.o intel-iommu.o
+obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o
 obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
 obj-$(CONFIG_IRQ_REMAP) += intel_irq_remapping.o irq_remapping.o
 obj-$(CONFIG_OMAP_IOMMU) += omap-iommu.o
-- 
1.9.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] Genericise the IOVA allocator

2015-01-12 Thread Robin Murphy
Hi all,

Here's an update of my previous RFC[1] in preparation for hooking the
IOVA allocator up to the arm64 DMA mapping API, rebased onto 3.19-rc3.

I tried rebasing patches 3 and 4 onto Sakari's RFC series[2] (the merge
conflict is pretty trivial), however I found that series applied to rc3
causes a build error in intel-iommu.c. Thus for now I've left in my
simpler patches 1 and 2 for breaking out the library. Hopefully we can
reach some consensus on that.

Tested on arm64 (DMA mapping series coming soon), and compile-tested
for x86_64_defconfig.

Changes since RFC:
Patch 1: Use a proper Kconfig symbol rather than a hack
Patch 4: sanity check for powers of two also, and clarify the comment

[1]:http://thread.gmane.org/gmane.linux.kernel.iommu/7480
[2]:http://thread.gmane.org/gmane.linux.kernel.iommu/7436

Robin Murphy (4):
  iommu: allow building iova.c independently
  iommu: consolidate IOVA allocator code
  iommu: make IOVA domain low limit flexible
  iommu: make IOVA domain page size explicit

 drivers/iommu/Kconfig   |  4 
 drivers/iommu/Makefile  |  3 ++-
 drivers/iommu/intel-iommu.c | 45 ++
 drivers/iommu/iova.c| 53 +
 include/linux/iova.h| 41 +++
 5 files changed, 103 insertions(+), 43 deletions(-)

-- 
1.9.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] iommu: make IOVA domain low limit flexible

2015-01-12 Thread Robin Murphy
To share the IOVA allocator with other architectures, it needs to
accommodate more general aperture restrictions; move the lower limit
from a compile-time constant to a runtime domain property to allow
IOVA domains with different requirements to co-exist.

Also reword the slightly unclear description of alloc_iova since we're
touching it anyway.

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
 drivers/iommu/intel-iommu.c |  9 ++---
 drivers/iommu/iova.c| 10 ++
 include/linux/iova.h|  7 +++
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5699653..275d056 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -71,6 +71,9 @@
__DOMAIN_MAX_PFN(gaw), (unsigned long)-1))
 #define DOMAIN_MAX_ADDR(gaw)   (((uint64_t)__DOMAIN_MAX_PFN(gaw))  
VTD_PAGE_SHIFT)
 
+/* IO virtual address start page frame number */
+#define IOVA_START_PFN (1)
+
 #define IOVA_PFN(addr) ((addr)  PAGE_SHIFT)
 #define DMA_32BIT_PFN  IOVA_PFN(DMA_BIT_MASK(32))
 #define DMA_64BIT_PFN  IOVA_PFN(DMA_BIT_MASK(64))
@@ -1632,7 +1635,7 @@ static int dmar_init_reserved_ranges(void)
struct iova *iova;
int i;
 
-   init_iova_domain(reserved_iova_list, DMA_32BIT_PFN);
+   init_iova_domain(reserved_iova_list, IOVA_START_PFN, DMA_32BIT_PFN);
 
lockdep_set_class(reserved_iova_list.iova_rbtree_lock,
reserved_rbtree_key);
@@ -1690,7 +1693,7 @@ static int domain_init(struct dmar_domain *domain, int 
guest_width)
int adjust_width, agaw;
unsigned long sagaw;
 
-   init_iova_domain(domain-iovad, DMA_32BIT_PFN);
+   init_iova_domain(domain-iovad, IOVA_START_PFN, DMA_32BIT_PFN);
domain_reserve_special_ranges(domain);
 
/* calculate AGAW */
@@ -4321,7 +4324,7 @@ static int md_domain_init(struct dmar_domain *domain, int 
guest_width)
 {
int adjust_width;
 
-   init_iova_domain(domain-iovad, DMA_32BIT_PFN);
+   init_iova_domain(domain-iovad, IOVA_START_PFN, DMA_32BIT_PFN);
domain_reserve_special_ranges(domain);
 
/* calculate AGAW */
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 520b8c8..a3dbba8 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -55,11 +55,13 @@ void free_iova_mem(struct iova *iova)
 }
 
 void
-init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit)
+init_iova_domain(struct iova_domain *iovad, unsigned long start_pfn,
+   unsigned long pfn_32bit)
 {
spin_lock_init(iovad-iova_rbtree_lock);
iovad-rbroot = RB_ROOT;
iovad-cached32_node = NULL;
+   iovad-start_pfn = start_pfn;
iovad-dma_32bit_pfn = pfn_32bit;
 }
 
@@ -162,7 +164,7 @@ move_left:
if (!curr) {
if (size_aligned)
pad_size = iova_get_pad_size(size, limit_pfn);
-   if ((IOVA_START_PFN + size + pad_size)  limit_pfn) {
+   if ((iovad-start_pfn + size + pad_size)  limit_pfn) {
spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags);
return -ENOMEM;
}
@@ -237,8 +239,8 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova)
  * @size: - size of page frames to allocate
  * @limit_pfn: - max limit address
  * @size_aligned: - set if size_aligned address range is required
- * This function allocates an iova in the range limit_pfn to IOVA_START_PFN
- * looking from limit_pfn instead from IOVA_START_PFN. If the size_aligned
+ * This function allocates an iova in the range iovad-start_pfn to limit_pfn,
+ * searching top-down from limit_pfn to iovad-start_pfn. If the size_aligned
  * flag is set then the allocated address iova-pfn_lo will be naturally
  * aligned on roundup_power_of_two(size).
  */
diff --git a/include/linux/iova.h b/include/linux/iova.h
index ad0507c..591b196 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -16,9 +16,6 @@
 #include linux/rbtree.h
 #include linux/dma-mapping.h
 
-/* IO virtual address start page frame number */
-#define IOVA_START_PFN (1)
-
 /* iova structure */
 struct iova {
struct rb_node  node;
@@ -31,6 +28,7 @@ struct iova_domain {
spinlock_t  iova_rbtree_lock; /* Lock to protect update of rbtree */
struct rb_root  rbroot; /* iova domain rbtree root */
struct rb_node  *cached32_node; /* Save last alloced node */
+   unsigned long   start_pfn;  /* Lower limit for this domain */
unsigned long   dma_32bit_pfn;
 };
 
@@ -52,7 +50,8 @@ struct iova *alloc_iova(struct iova_domain *iovad, unsigned 
long size,
 struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo,
unsigned long pfn_hi);
 void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to);
-void init_iova_domain(struct iova_domain *iovad, unsigned long 

[PATCH 4/4] iommu: make IOVA domain page size explicit

2015-01-12 Thread Robin Murphy
Systems may contain heterogeneous IOMMUs supporting differing minimum
page sizes, which may also not be common with the CPU page size.
Thus it is practical to have an explicit notion of IOVA granularity
to simplify handling of mapping and allocation constraints.

As an initial step, move the IOVA page granularity from an implicit
compile-time constant to a per-domain property so we can make use
of it in IOVA domain context at runtime. To keep the abstraction tidy,
extend the little API of inline iova_* helpers to parallel some of the
equivalent PAGE_* macros.

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
 drivers/iommu/intel-iommu.c |  9 ++---
 drivers/iommu/iova.c| 12 ++--
 include/linux/iova.h| 35 +--
 3 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 275d056..a0f5817 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1635,7 +1635,8 @@ static int dmar_init_reserved_ranges(void)
struct iova *iova;
int i;
 
-   init_iova_domain(reserved_iova_list, IOVA_START_PFN, DMA_32BIT_PFN);
+   init_iova_domain(reserved_iova_list, VTD_PAGE_SIZE, IOVA_START_PFN,
+   DMA_32BIT_PFN);
 
lockdep_set_class(reserved_iova_list.iova_rbtree_lock,
reserved_rbtree_key);
@@ -1693,7 +1694,8 @@ static int domain_init(struct dmar_domain *domain, int 
guest_width)
int adjust_width, agaw;
unsigned long sagaw;
 
-   init_iova_domain(domain-iovad, IOVA_START_PFN, DMA_32BIT_PFN);
+   init_iova_domain(domain-iovad, VTD_PAGE_SIZE, IOVA_START_PFN,
+   DMA_32BIT_PFN);
domain_reserve_special_ranges(domain);
 
/* calculate AGAW */
@@ -4324,7 +4326,8 @@ static int md_domain_init(struct dmar_domain *domain, int 
guest_width)
 {
int adjust_width;
 
-   init_iova_domain(domain-iovad, IOVA_START_PFN, DMA_32BIT_PFN);
+   init_iova_domain(domain-iovad, VTD_PAGE_SIZE, IOVA_START_PFN,
+   DMA_32BIT_PFN);
domain_reserve_special_ranges(domain);
 
/* calculate AGAW */
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index a3dbba8..9dd8208 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -55,12 +55,20 @@ void free_iova_mem(struct iova *iova)
 }
 
 void
-init_iova_domain(struct iova_domain *iovad, unsigned long start_pfn,
-   unsigned long pfn_32bit)
+init_iova_domain(struct iova_domain *iovad, unsigned long granule,
+   unsigned long start_pfn, unsigned long pfn_32bit)
 {
+   /*
+* IOVA granularity will normally be equal to the smallest
+* supported IOMMU page size; both *must* be capable of
+* representing individual CPU pages exactly.
+*/
+   BUG_ON((granule  PAGE_SIZE) || !is_power_of_2(granule));
+
spin_lock_init(iovad-iova_rbtree_lock);
iovad-rbroot = RB_ROOT;
iovad-cached32_node = NULL;
+   iovad-granule = granule;
iovad-start_pfn = start_pfn;
iovad-dma_32bit_pfn = pfn_32bit;
 }
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 591b196..3920a19 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -28,6 +28,7 @@ struct iova_domain {
spinlock_t  iova_rbtree_lock; /* Lock to protect update of rbtree */
struct rb_root  rbroot; /* iova domain rbtree root */
struct rb_node  *cached32_node; /* Save last alloced node */
+   unsigned long   granule;/* pfn granularity for this domain */
unsigned long   start_pfn;  /* Lower limit for this domain */
unsigned long   dma_32bit_pfn;
 };
@@ -37,6 +38,36 @@ static inline unsigned long iova_size(struct iova *iova)
return iova-pfn_hi - iova-pfn_lo + 1;
 }
 
+static inline unsigned long iova_shift(struct iova_domain *iovad)
+{
+   return __ffs(iovad-granule);
+}
+
+static inline unsigned long iova_mask(struct iova_domain *iovad)
+{
+   return iovad-granule - 1;
+}
+
+static inline size_t iova_offset(struct iova_domain *iovad, dma_addr_t iova)
+{
+   return iova  iova_mask(iovad);
+}
+
+static inline size_t iova_align(struct iova_domain *iovad, size_t size)
+{
+   return ALIGN(size, iovad-granule);
+}
+
+static inline dma_addr_t iova_dma_addr(struct iova_domain *iovad, struct iova 
*iova)
+{
+   return (dma_addr_t)iova-pfn_lo  iova_shift(iovad);
+}
+
+static inline unsigned long iova_pfn(struct iova_domain *iovad, dma_addr_t 
iova)
+{
+   return iova  iova_shift(iovad);
+}
+
 int iommu_iova_cache_init(void);
 void iommu_iova_cache_destroy(void);
 
@@ -50,8 +81,8 @@ struct iova *alloc_iova(struct iova_domain *iovad, unsigned 
long size,
 struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo,
unsigned long pfn_hi);
 void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to);
-void 

[PATCH 2/4] iommu: consolidate IOVA allocator code

2015-01-12 Thread Robin Murphy
In order to share the IOVA allocator with other architectures, break
the unnecssary dependency on the Intel IOMMU driver and move the
remaining IOVA internals to iova.c

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
 drivers/iommu/intel-iommu.c | 33 ++---
 drivers/iommu/iova.c| 35 +++
 include/linux/iova.h|  3 +++
 3 files changed, 40 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 1232336..5699653 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -485,7 +485,6 @@ __setup(intel_iommu=, intel_iommu_setup);
 
 static struct kmem_cache *iommu_domain_cache;
 static struct kmem_cache *iommu_devinfo_cache;
-static struct kmem_cache *iommu_iova_cache;
 
 static inline void *alloc_pgtable_page(int node)
 {
@@ -523,16 +522,6 @@ static inline void free_devinfo_mem(void *vaddr)
kmem_cache_free(iommu_devinfo_cache, vaddr);
 }
 
-struct iova *alloc_iova_mem(void)
-{
-   return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC);
-}
-
-void free_iova_mem(struct iova *iova)
-{
-   kmem_cache_free(iommu_iova_cache, iova);
-}
-
 static inline int domain_type_is_vm(struct dmar_domain *domain)
 {
return domain-flags  DOMAIN_FLAG_VIRTUAL_MACHINE;
@@ -3427,23 +3416,6 @@ static inline int iommu_devinfo_cache_init(void)
return ret;
 }
 
-static inline int iommu_iova_cache_init(void)
-{
-   int ret = 0;
-
-   iommu_iova_cache = kmem_cache_create(iommu_iova,
-sizeof(struct iova),
-0,
-SLAB_HWCACHE_ALIGN,
-NULL);
-   if (!iommu_iova_cache) {
-   printk(KERN_ERR Couldn't create iova cache\n);
-   ret = -ENOMEM;
-   }
-
-   return ret;
-}
-
 static int __init iommu_init_mempool(void)
 {
int ret;
@@ -3461,7 +3433,7 @@ static int __init iommu_init_mempool(void)
 
kmem_cache_destroy(iommu_domain_cache);
 domain_error:
-   kmem_cache_destroy(iommu_iova_cache);
+   iommu_iova_cache_destroy();
 
return -ENOMEM;
 }
@@ -3470,8 +3442,7 @@ static void __init iommu_exit_mempool(void)
 {
kmem_cache_destroy(iommu_devinfo_cache);
kmem_cache_destroy(iommu_domain_cache);
-   kmem_cache_destroy(iommu_iova_cache);
-
+   iommu_iova_cache_destroy();
 }
 
 static void quirk_ioat_snb_local_iommu(struct pci_dev *pdev)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index f6b17e6..520b8c8 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -18,6 +18,41 @@
  */
 
 #include linux/iova.h
+#include linux/slab.h
+
+static struct kmem_cache *iommu_iova_cache;
+
+int iommu_iova_cache_init(void)
+{
+   int ret = 0;
+
+   iommu_iova_cache = kmem_cache_create(iommu_iova,
+sizeof(struct iova),
+0,
+SLAB_HWCACHE_ALIGN,
+NULL);
+   if (!iommu_iova_cache) {
+   pr_err(Couldn't create iova cache\n);
+   ret = -ENOMEM;
+   }
+
+   return ret;
+}
+
+void iommu_iova_cache_destroy(void)
+{
+   kmem_cache_destroy(iommu_iova_cache);
+}
+
+struct iova *alloc_iova_mem(void)
+{
+   return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC);
+}
+
+void free_iova_mem(struct iova *iova)
+{
+   kmem_cache_free(iommu_iova_cache, iova);
+}
 
 void
 init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit)
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 19e81d5..ad0507c 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -39,6 +39,9 @@ static inline unsigned long iova_size(struct iova *iova)
return iova-pfn_hi - iova-pfn_lo + 1;
 }
 
+int iommu_iova_cache_init(void);
+void iommu_iova_cache_destroy(void);
+
 struct iova *alloc_iova_mem(void);
 void free_iova_mem(struct iova *iova);
 void free_iova(struct iova_domain *iovad, unsigned long pfn);
-- 
1.9.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Joerg Roedel
On Mon, Jan 12, 2015 at 11:15:38AM -0500, Vivek Goyal wrote:
 On Mon, Jan 12, 2015 at 05:06:46PM +0100, Joerg Roedel wrote:
  On Mon, Jan 12, 2015 at 10:29:19AM -0500, Vivek Goyal wrote:
   Kdump has the notion of backup region. Where certain parts of old kernels
   memory can be moved to a different location (first 640K on x86 as of now)
   and new kernel can make use of this memory now.
   
   So we will have to just make sure that no parts of this old page table
   fall into backup region.
  
  Uuh, looks like the 'iommu-with-kdump-issue' isn't complicated enough
  yet ;)
  Sadly, your above statement is true for all hardware-accessible data
  structures in IOMMU code. I think about how we can solve this, is there
  an easy way to allocate memory that is not in any backup region?
 
 Hmm..., there does not seem to be any easy way to do this. In fact, as of
 now, kernel does not even know where is backup region. All these details are
 managed by user space completely (except for new kexec_file_load() syscall).
 
 That means we are left with ugly options now.
 
 - Define per arch kexec backup regions in kernel and export it to user
   space and let kexec-tools make use of that deinition (instead of
   defining its own). That way memory allocation code in kernel can look
   at this backup area and skip it for certain allocations.

Yes, that makes sense. In fact, I think all allocations for DMA memory
need to take this into account to avoid potentially serious data
corruption.
If any memory for a disk superblock gets allocated in backup memory and
a kdump happens, the new kernel might zero out that area and the disk
controler then writes the zeroes to disk instead of the superblock.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu