Re: [RFC] mm: iommu: An API to unify IOMMU, CPU and device memory management

2010-06-28 Thread Zach Pfeffer
FUJITA Tomonori wrote:
 On Thu, 24 Jun 2010 23:48:50 -0700
 Zach Pfeffer zpfef...@codeaurora.org wrote:
 
 Andi Kleen wrote:
 Zach Pfeffer zpfef...@codeaurora.org writes:

 This patch contains the documentation for and the main header file of
 the API, termed the Virtual Contiguous Memory Manager. Its use would
 allow all of the IOMMU to VM, VM to device and device to IOMMU
 interoperation code to be refactored into platform independent code.
 I read all the description and it's still unclear what advantage
 this all has over the current architecture? 

 At least all the benefits mentioned seem to be rather nebulous.

 Can you describe a concrete use case that is improved by this code
 directly?
 Sure. On a SoC with many IOMMUs (10-100), where each IOMMU may have
 its own set of page-tables or share page-tables, and where devices
 with and without IOMMUs and CPUs with or without MMUS want to
 communicate, an abstraction like the VCM helps manage all conceivable
 mapping topologies. In the same way that the Linux MM manages pages
 apart from page-frames, the VCMM allows the Linux MM to manage ideal
 memory regions, VCMs, apart from the actual memory region.

 One real scenario would be video playback from a file on a memory
 card. To read and display the video, a DMA engine would read blocks of
 data from the memory card controller into memory. These would
 typically be managed using a scatter-gather list. This list would be
 mapped into a contiguous buffer of the video decoder's IOMMU. The
 video decoder would write into a buffer mapped by the display engine's
 IOMMU as well as the CPU (if the kernel needed to intercept the
 buffers). In this instance, the video decoder's IOMMU and the display
 engine's IOMMU use different page-table formats.

 Using the VCM API, this topology can be created without worrying about
 the device's IOMMUs or how to map the buffers into the kernel, or how
 to interoperate with the scatter-gather list. The call flow would would go:
 
 Can you explain how you can't do the above with the existing API?

Sure. You can do the same thing with the current API, but the VCM takes a
wider view; the mapper is a parameter.

Taking include/linux/iommu.h as a common interface, the key function
is iommu_map(). This function maps a physical memory region, paddr, of
gfp_order, to a virtual region starting at iova:

extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, int gfp_order, int prot);

Users who call this, kvm_iommu_map_pages() for instance, run similar
loops:

foreach page 
iommu_map(domain, va(page), ...)

The VCM encapsulates this as vcm_back(). This function iterates over a
set of physical regions and maps those physical regions to a virtual
address space that has been associated with a mapper at run-time. The
loop above, and the other loops (and other associated IOMMU software)
that don't use the common interface like arch/powerpc/kernel/vio.c all
do similar work.

In the end the VCM's dynamic virtual region association mechanism (and
multihomed physical memory targeting) allows all IOMMU mapping code in
the system to use the same API.

This may seem like syntactic sugar, but treating devices with IOMMUs
(bus-masters), device with MMUs (CPUs) and devices without MMUs (DMA
engines) as endpoints in a mapping graph allows new features to be
developed. One such feature is system-wide memory migration (including
memory that devices map). With a common API a loop like this can be
written one place:

foreach mapper of pa_region
remap(mapper, new_pa_region)

It could also be used for better power-management:

foreach mapper of soon_to_be_powered_off_pa_region
ask(mapper, soon_to_be_powered_off_pa_region)

The VCM is just the first step.

More concretely, the way the VCM works allows the transparent use and
interoperation of different mapping chunk sizes. This is important in
multimedia devices because IOMMU TLB misses may cause multimedia
devices to miss their performance goals. Multi-chunk size support has
been added for IOMMU mappers and wouldn't be hard to add to CPU
mappers (CPU mappers still use 4KB).

 The general point of the VCMM is to allow users a higher level API
 than the current IOMMU abstraction provides that solves the general
 mapping problem. This means that all of the common mapping code would
 be written once. In addition, the API allows all the low level details
 of IOMMU programing and VM interoperation to be handled at the right
 level.

 Eventually the following functions could all be reworked and their
 users could call VCM functions.
 
 There are more IOMMUs (e.g. x86 has calgary, gart too). And what is
 the point of converting old IOMMUs (the majority of the below)? are
 there any potential users of your API for such old IOMMUs?

That's a good question. I gave the list of the current IOMMU mapping
functions to bring awareness to the fact that the general system-wide
mapping 

Re: [RFC] mm: iommu: An API to unify IOMMU, CPU and device memory management

2010-06-26 Thread FUJITA Tomonori
On Thu, 24 Jun 2010 23:48:50 -0700
Zach Pfeffer zpfef...@codeaurora.org wrote:

 Andi Kleen wrote:
  Zach Pfeffer zpfef...@codeaurora.org writes:
  
  This patch contains the documentation for and the main header file of
  the API, termed the Virtual Contiguous Memory Manager. Its use would
  allow all of the IOMMU to VM, VM to device and device to IOMMU
  interoperation code to be refactored into platform independent code.
  
  I read all the description and it's still unclear what advantage
  this all has over the current architecture? 
  
  At least all the benefits mentioned seem to be rather nebulous.
  
  Can you describe a concrete use case that is improved by this code
  directly?
 
 Sure. On a SoC with many IOMMUs (10-100), where each IOMMU may have
 its own set of page-tables or share page-tables, and where devices
 with and without IOMMUs and CPUs with or without MMUS want to
 communicate, an abstraction like the VCM helps manage all conceivable
 mapping topologies. In the same way that the Linux MM manages pages
 apart from page-frames, the VCMM allows the Linux MM to manage ideal
 memory regions, VCMs, apart from the actual memory region.
 
 One real scenario would be video playback from a file on a memory
 card. To read and display the video, a DMA engine would read blocks of
 data from the memory card controller into memory. These would
 typically be managed using a scatter-gather list. This list would be
 mapped into a contiguous buffer of the video decoder's IOMMU. The
 video decoder would write into a buffer mapped by the display engine's
 IOMMU as well as the CPU (if the kernel needed to intercept the
 buffers). In this instance, the video decoder's IOMMU and the display
 engine's IOMMU use different page-table formats.
 
 Using the VCM API, this topology can be created without worrying about
 the device's IOMMUs or how to map the buffers into the kernel, or how
 to interoperate with the scatter-gather list. The call flow would would go:

Can you explain how you can't do the above with the existing API?


 The general point of the VCMM is to allow users a higher level API
 than the current IOMMU abstraction provides that solves the general
 mapping problem. This means that all of the common mapping code would
 be written once. In addition, the API allows all the low level details
 of IOMMU programing and VM interoperation to be handled at the right
 level.
 
 Eventually the following functions could all be reworked and their
 users could call VCM functions.

There are more IOMMUs (e.g. x86 has calgary, gart too). And what is
the point of converting old IOMMUs (the majority of the below)? are
there any potential users of your API for such old IOMMUs?
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] mm: iommu: An API to unify IOMMU, CPU and device memory management

2010-06-25 Thread Zach Pfeffer
Andi Kleen wrote:
 Zach Pfeffer zpfef...@codeaurora.org writes:
 
 This patch contains the documentation for and the main header file of
 the API, termed the Virtual Contiguous Memory Manager. Its use would
 allow all of the IOMMU to VM, VM to device and device to IOMMU
 interoperation code to be refactored into platform independent code.
 
 I read all the description and it's still unclear what advantage
 this all has over the current architecture? 
 
 At least all the benefits mentioned seem to be rather nebulous.
 
 Can you describe a concrete use case that is improved by this code
 directly?

Sure. On a SoC with many IOMMUs (10-100), where each IOMMU may have
its own set of page-tables or share page-tables, and where devices
with and without IOMMUs and CPUs with or without MMUS want to
communicate, an abstraction like the VCM helps manage all conceivable
mapping topologies. In the same way that the Linux MM manages pages
apart from page-frames, the VCMM allows the Linux MM to manage ideal
memory regions, VCMs, apart from the actual memory region.

One real scenario would be video playback from a file on a memory
card. To read and display the video, a DMA engine would read blocks of
data from the memory card controller into memory. These would
typically be managed using a scatter-gather list. This list would be
mapped into a contiguous buffer of the video decoder's IOMMU. The
video decoder would write into a buffer mapped by the display engine's
IOMMU as well as the CPU (if the kernel needed to intercept the
buffers). In this instance, the video decoder's IOMMU and the display
engine's IOMMU use different page-table formats.

Using the VCM API, this topology can be created without worrying about
the device's IOMMUs or how to map the buffers into the kernel, or how
to interoperate with the scatter-gather list. The call flow would would go:

1. Establish a memory region for the video decoder and the display engine
that's 128 MB and starts at 0x1000.

vcm_out = vcm_create(0x1000, SZ_128M);


2. Associate the memory region with the video decoder's IOMMU and the
display engine's IOMMU.

avcm_dec = vcm_assoc(vcm_out, video_dec_dev, 0);
avcm_disp = vcm_assoc(vcm_out, disp_dev, 0);

The 2 dev_ids, video_dec_dev and disp_dev allow the right IOMMU
low-level functions to be called underneath.


3. Actually program the underlying IOMMUs.

vcm_activate(avcm_dec);
vcm_activate(avcm_disp);


4. Allocate 2 physical buffers that the DMA engine and video decoder will
use. Make sure each buffer is 64 KB contiguous.

buf_64k = vcm_phys_alloc(MT0, 2*SZ_64K, VCM_64KB);


5. Allocate a 16 MB buffer for the output of the video decoder and the
input of the display engine. Use 1MB, 64KB and 4KB blocks to map the
buffer.

buf_frame = vcm_phys_alloc(MT0, SZ_16M);


6. Program the DMA controller.

buf = vcm_get_next_phys_addr(buf_64k, NULL, len);
while (buf) {
   dma_prg(buf);
   buf = vcm_get_next_phys_addr(buf_64k, NULL, len);
}


7. Create virtual memory regions for the DMA buffers and the video
decoder output from the vcm_out region. Make sure the buffers are
aligned to the buffer size.

res_64k = vcm_reserve(vcm_out, 8*SZ_64K, VCM_ALIGN_64K);
res_16M = vcm_reserve(vcm_out, SZ_16M, VCM_ALIGN_16M);


8. Connect the virtual reservations with the physical allocations.

vcm_back(res_64k, buf_64k);
vcm_back(res_16M, buf_frame);


9. Program the decoder and the display engine with addresses from the
 IOMMU side of the mapping:

base_64k = vcm_get_dev_addr(res_64k);
base_16M = vcm_get_dev_addr(res_16M);


10. Create a kernel mapping to read and write the 16M buffer.

cpu_vcm = vcm_create_from_prebuilt(VCM_PREBUILT_KERNEL);


11. Create a reservation on that prebuilt VCM. Use any alignment.

res_cpu_16M = vcm_reserve(cpu_vcm, SZ_16M, 0);


12. Back the reservation using the same physical memory that the
decoder and the display engine are looking at.

vcm_back(res_cpu_16M, buf_frame);


13. Get a pointer that kernel can dereference.

base_cpu_16M = vcm_get_dev_addr(res_cpu_16M);


The general point of the VCMM is to allow users a higher level API
than the current IOMMU abstraction provides that solves the general
mapping problem. This means that all of the common mapping code would
be written once. In addition, the API allows all the low level details
of IOMMU programing and VM interoperation to be handled at the right
level.

Eventually the following functions could all be reworked and their
users could call VCM functions.

arch/arm/plat-omap/iovmm.c
map_iovm_area()

arch/m68k/sun3/sun3dvma.c
dvma_map_align()

arch/alpha/kernel/pci_iommu.c
pci_map_single_1()

arch/powerpc/platforms/pasemi/iommu.c
iobmap_build()

arch/powerpc/kernel/iommu.c
iommu_map_page()

arch/sparc/mm/iommu.c
iommu_map_dma_area()

arch/sparc/kernel/pci_sun4v_asm.S
ENTRY(pci_sun4v_iommu_map)

arch/ia64/hp/common/sba_iommu.c
sba_map_page()

arch/arm/mach-omap2/iommu2.c
omap2_iommu_init()

arch/arm/plat-omap/iovmm.c

Re: [RFC] mm: iommu: An API to unify IOMMU, CPU and device memory management

2010-06-24 Thread Andi Kleen
Zach Pfeffer zpfef...@codeaurora.org writes:

 This patch contains the documentation for and the main header file of
 the API, termed the Virtual Contiguous Memory Manager. Its use would
 allow all of the IOMMU to VM, VM to device and device to IOMMU
 interoperation code to be refactored into platform independent code.

I read all the description and it's still unclear what advantage
this all has over the current architecture? 

At least all the benefits mentioned seem to be rather nebulous.

Can you describe a concrete use case that is improved by this code
directly?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] mm: iommu: An API to unify IOMMU, CPU and device memory management

2010-06-23 Thread Zach Pfeffer
This patch contains the documentation for and the main header file of
the API, termed the Virtual Contiguous Memory Manager. Its use would
allow all of the IOMMU to VM, VM to device and device to IOMMU
interoperation code to be refactored into platform independent code.

Comments, suggestions and criticisms are welcome and wanted.

Signed-off-by: Zach Pfeffer zpfef...@codeaurora.org
---
 Documentation/vcm.txt |  583 
 include/linux/vcm.h   | 1017 +
 2 files changed, 1600 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/vcm.txt
 create mode 100644 include/linux/vcm.h

diff --git a/Documentation/vcm.txt b/Documentation/vcm.txt
new file mode 100644
index 000..d29c757
--- /dev/null
+++ b/Documentation/vcm.txt
@@ -0,0 +1,583 @@
+What is this document about?
+
+
+This document covers how to use the Virtual Contiguous Memory Manager
+(VCMM), how the first implmentation works with a specific low-level
+Input/Output Memory Management Unit (IOMMU) and the way the VCMM is used
+from user-space. It also contains a section that describes why something
+like the VCMM is needed in the kernel.
+
+If anything in this document is wrong please send patches to the
+maintainer of this file, listed at the bottom of the document.
+
+
+The Virtual Contiguous Memory Manager
+=
+
+The VCMM was built to solve the system-wide memory mapping issues that
+occur when many bus-masters have IOMMUs.
+
+An IOMMU maps device addresses to physical addresses. It also insulates
+the system from spurious or malicious device bus transactions and allows
+fine-grained mapping attribute control. The Linux kernel core does not
+contain a generic API to handle IOMMU mapped memory; device driver writers
+must implement device specific code to interoperate with the Linux kernel
+core. As the number of IOMMUs increases, coordinating the many address
+spaces mapped by all discrete IOMMUs becomes difficult without in-kernel
+support.
+
+The VCMM API enables device independent IOMMU control, virtual memory
+manager (VMM) interoperation and non-IOMMU enabled device interoperation
+by treating devices with or without IOMMUs and all CPUs with or without
+MMUs, their mapping contexts and their mappings using common
+abstractions. Physical hardware is given a generic device type and mapping
+contexts are abstracted into Virtual Contiguous Memory (VCM)
+regions. Users reserve memory from VCMs and back their reservations
+with physical memory.
+
+Why the VCMM is Needed
+--
+
+Driver writers who control devices with IOMMUs must contend with device
+control and memory management. Driver writers have a large device driver
+API that they can leverage to control their devices, but they are lacking
+a unified API to help them program mappings into IOMMUs and share those
+mappings with other devices and CPUs in the system.
+
+Sharing is complicated by Linux's CPU centric VMM. The CPU centric model
+generally makes sense because average hardware only contains a MMU for the
+CPU and possibly a graphics MMU. If every device in the system has one or
+more MMUs the CPU centric memory management (MM) programming model breaks
+down.
+
+Abstracting IOMMU device programming into a common API has already begun
+in the Linux kernel. It was built to abstract the difference between AMDs
+and Intels IOMMUs to support x86 virtualization on both platforms. The
+interface is listed in kernel/include/linux/iommu.h. It contains
+interfaces for mapping and unmapping as well as domain management. This
+interface has not gained widespread use outside the x86; PA-RISC, Alpha
+and SPARC architectures and ARM and PowerPC platforms all use their own
+mapping modules to control their IOMMUs. The VCMM contains an IOMMU
+programming layer, but since its abstraction supports map management
+independent of device control, the layer is not used directly. This
+higher-level view enables a new kernel service, not just an IOMMU
+interoperation layer.
+
+The General Idea: Map Management using Graphs
+-
+
+Looking at mapping from a system-wide perspective reveals a general graph
+problem. The VCMMs API is built to manage the general mapping graph. Each
+node that talks to memory, either through an MMU or directly (physically
+mapped) can be thought of as the device-end of a mapping edge. The other
+edge is the physical memory (or intermediate virtual space) that is
+mapped.
+
+In the direct mapped case the device is assigned a one-to-one MMU. This
+scheme allows direct mapped devices to participate in general graph
+management.
+
+The CPU nodes can also be brought under the same mapping abstraction with
+the use of a light overlay on the existing VMM. This light overlay allows
+VMM managed mappings to interoperate with the common API. The light
+overlay enables this without substantial