date:20140923

Re: [PATCH] kvm: don't take vcpu mutex for obviously invalid vcpu ioctls

2014-09-23 Thread Gleb Natapov

On Mon, Sep 22, 2014 at 09:29:19PM +0200, Paolo Bonzini wrote:
 Il 22/09/2014 21:20, Christian Borntraeger ha scritto:
  while using trinity to fuzz KVM, we noticed long stalls on invalid ioctls. 
  Lets bail out early on invalid ioctls. or similar?
 
 Okay.  David, can you explain how you found it so that I can make up my
 mind?
 
 Gleb and Marcelo, a fourth and fifth opinion? :)
 
I agree with Christian that simpler fix is better here.
The overhead is minimal. If we ever notice this overhead
we can revert the patch all together since the problem it
fixes can only be inflicted on userspace by itself and there
are myriads other ways userspace can hurt itself.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: Remove debug assertion of non-PAE reserved bits

2014-09-23 Thread Nadav Amit

Commit 346874c9507a (KVM: x86: Fix CR3 reserved bits) removed non-PAE
reserved bits which were not according to Intel SDM.  However, residue was left
in a debug assertion (CR3_NONPAE_RESERVED_BITS).  Remove it.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/paging_tmpl.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 0ab6c65..806d58e 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -298,8 +298,7 @@ retry_walk:
}
 #endif
walker-max_level = walker-level;
-   ASSERT((!is_long_mode(vcpu)  is_pae(vcpu)) ||
-  (mmu-get_cr3(vcpu)  CR3_NONPAE_RESERVED_BITS) == 0);
+   ASSERT(!is_long_mode(vcpu)  is_pae(vcpu));
 
accessed_dirty = PT_GUEST_ACCESSED_MASK;
pt_access = pte_access = ACC_ALL;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Paolo Bonzini

Il 22/09/2014 23:54, Andres Lagar-Cavilla ha scritto:
 @@ -1406,32 +1406,24 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned 
 long *rmapp,
   struct rmap_iterator uninitialized_var(iter);
   int young = 0;
  
 - /*
 -  * In case of absence of EPT Access and Dirty Bits supports,
 -  * emulate the accessed bit for EPT, by checking if this page has
 -  * an EPT mapping, and clearing it if it does. On the next access,
 -  * a new EPT mapping will be established.
 -  * This has some overhead, but not as much as the cost of swapping
 -  * out actively used pages or breaking up actively used hugepages.
 -  */
 - if (!shadow_accessed_mask) {
 - young = kvm_unmap_rmapp(kvm, rmapp, slot, data);
 - goto out;
 - }
 + BUG_ON(!shadow_accessed_mask);
  
   for (sptep = rmap_get_first(*rmapp, iter); sptep;
sptep = rmap_get_next(iter)) {
 + struct kvm_mmu_page *sp;
 + gfn_t gfn;
   BUG_ON(!is_shadow_present_pte(*sptep));
 + /* From spte to gfn. */
 + sp = page_header(__pa(sptep));
 + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);
  
   if (*sptep  shadow_accessed_mask) {
   young = 1;
   clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
   }
 + trace_kvm_age_page(gfn, slot, young);

Yesterday I couldn't think of a way to avoid the
page_header/kvm_mmu_page_get_gfn on every iteration, but it's actually
not hard.  Instead of passing hva as datum, you can pass (unsigned long)
start.  Then you can add PAGE_SIZE to it at the end of every call to
kvm_age_rmapp, and keep the old tracing logic.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only

2014-09-23 Thread Paolo Bonzini

Il 22/09/2014 21:43, Borislav Petkov ha scritto:
  On x86_64, kernel text mappings are mapped read-only with 
  CONFIG_DEBUG_RODATA.
 Hmm, that depends on DEBUG_KERNEL.
 
 I think you're actually talking about distro kernels which enable
 CONFIG_DEBUG_RODATA, right?

This is for guest kernels, so it's not necessarily distro kernels.
Anyone who compiles their kernel with CONFIG_DEBUG_RODATA + PV spinlocks
would not be able to run it on AMD.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Remove debug assertion of non-PAE reserved bits

2014-09-23 Thread Paolo Bonzini

Il 23/09/2014 09:01, Nadav Amit ha scritto:
 Commit 346874c9507a (KVM: x86: Fix CR3 reserved bits) removed non-PAE
 reserved bits which were not according to Intel SDM.  However, residue was 
 left
 in a debug assertion (CR3_NONPAE_RESERVED_BITS).  Remove it.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  arch/x86/kvm/paging_tmpl.h | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
 index 0ab6c65..806d58e 100644
 --- a/arch/x86/kvm/paging_tmpl.h
 +++ b/arch/x86/kvm/paging_tmpl.h
 @@ -298,8 +298,7 @@ retry_walk:
   }
  #endif
   walker-max_level = walker-level;
 - ASSERT((!is_long_mode(vcpu)  is_pae(vcpu)) ||
 -(mmu-get_cr3(vcpu)  CR3_NONPAE_RESERVED_BITS) == 0);
 + ASSERT(!is_long_mode(vcpu)  is_pae(vcpu));
  
   accessed_dirty = PT_GUEST_ACCESSED_MASK;
   pt_access = pte_access = ACC_ALL;
 

Thanks, applied.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86, kvm: use macros to compute bank MSRs

2014-09-23 Thread Paolo Bonzini

Il 23/09/2014 04:44, Chen Yucong ha scritto:
 Avoid open coded calculations for bank MSRs by using well-defined
 macros that hide the index of higher bank MSRs.
 
 No semantic changes.
 
 Signed-off-by: Chen Yucong sla...@gmail.com
 ---
  arch/x86/kvm/x86.c |8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 31e55ae..e8c1e3b 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1825,7 +1825,7 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, u32 msr, 
 u64 data)
   break;
   default:
   if (msr = MSR_IA32_MC0_CTL 
 - msr  MSR_IA32_MC0_CTL + 4 * bank_num) {
 + msr  MSR_IA32_MCx_CTL(bank_num)) {
   u32 offset = msr - MSR_IA32_MC0_CTL;
   /* only 0 or all 1s can be written to IA32_MCi_CTL
* some Linux kernels though clear bit 10 in bank 4 to
 @@ -2184,7 +2184,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
 msr_data *msr_info)
  
   case MSR_IA32_MCG_CTL:
   case MSR_IA32_MCG_STATUS:
 - case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
 + case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
   return set_msr_mce(vcpu, msr, data);
  
   /* Performance counters are not protected by a CPUID bit,
 @@ -2350,7 +2350,7 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, 
 u64 *pdata)
   break;
   default:
   if (msr = MSR_IA32_MC0_CTL 
 - msr  MSR_IA32_MC0_CTL + 4 * bank_num) {
 + msr  MSR_IA32_MCx_CTL(bank_num)) {
   u32 offset = msr - MSR_IA32_MC0_CTL;
   data = vcpu-arch.mce_banks[offset];
   break;
 @@ -2531,7 +2531,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
 u64 *pdata)
   case MSR_IA32_MCG_CAP:
   case MSR_IA32_MCG_CTL:
   case MSR_IA32_MCG_STATUS:
 - case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
 + case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
   return get_msr_mce(vcpu, msr, pdata);
   case MSR_K7_CLK_CTL:
   /*
 

Thanks, applied.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: don't take vcpu mutex for obviously invalid vcpu ioctls

2014-09-23 Thread Christian Borntraeger

On 09/23/2014 08:49 AM, Gleb Natapov wrote:
 On Mon, Sep 22, 2014 at 09:29:19PM +0200, Paolo Bonzini wrote:
 Il 22/09/2014 21:20, Christian Borntraeger ha scritto:
 while using trinity to fuzz KVM, we noticed long stalls on invalid ioctls. 
 Lets bail out early on invalid ioctls. or similar?

 Okay.  David, can you explain how you found it so that I can make up my
 mind?

 Gleb and Marcelo, a fourth and fifth opinion? :)

 I agree with Christian that simpler fix is better here.
 The overhead is minimal. If we ever notice this overhead
 we can revert the patch all together since the problem it
 fixes can only be inflicted on userspace by itself and there
 are myriads other ways userspace can hurt itself.


Yes. Davids explanation also makes sense as a commit message. Paolo, if you use 
David patch with a better description of the why I am fine with this patch.

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-kmod] adjust timekeeping compatibility code

2014-09-23 Thread Paolo Bonzini

kvm_get_xtime_nsec could overflow.  If we make kvm_get_boot_base_ns
compute the equivalent of 3.17's base_mono+offs_boot formula (instead of
just offs_boot), we can avoid that and drop kvm_get_xtime_nsec altogether.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 external-module-compat-comm.h |  3 +--
 external-module-compat.c  | 19 ---
 sync  |  4 +---
 3 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h
index b6a2894..fdcd8e4 100644
--- a/external-module-compat-comm.h
+++ b/external-module-compat-comm.h
@@ -1412,9 +1412,8 @@ static inline void guest_exit(void)
  */
 #if LINUX_VERSION_CODE  KERNEL_VERSION(3,17,0)
 extern u64 ktime_get_boot_ns(void);
-extern u64 kvm_get_boot_base_ns(void);
 #if LINUX_VERSION_CODE = KERNEL_VERSION(3,8,0)
 struct timekeeper;
-extern u64 kvm_get_xtime_nsec(struct timekeeper *tk);
+extern u64 kvm_get_boot_base_ns(struct timekeeper *tk);
 #endif
 #endif
diff --git a/external-module-compat.c b/external-module-compat.c
index eb7bc62..38717b6 100644
--- a/external-module-compat.c
+++ b/external-module-compat.c
@@ -350,26 +350,15 @@ u64 ktime_get_boot_ns(void)
return timespec_to_ns(ts);
 }
 
-u64 kvm_get_boot_base_ns(void)
-{
-   struct timespec ts = { 0, 0 };
-
-   kvm_monotonic_to_bootbased(ts);
-   return timespec_to_ns(ts);
-}
-
 #if LINUX_VERSION_CODE = KERNEL_VERSION(3,8,0)
 #include linux/timekeeper_internal.h
 
-u64 kvm_get_xtime_nsec(struct timekeeper *tk)
+u64 kvm_get_boot_base_ns(struct timekeeper *tk)
 {
-   u64 monotonic_time_sec =
-   tk-xtime_sec + tk-wall_to_monotonic.tv_sec;
-   u64 monotonic_time_snsec =
-   tk-xtime_nsec + (tk-wall_to_monotonic.tv_nsec  tk-shift);
+   struct timespec ts = tk-wall_to_monotonic;
 
-   return ((monotonic_time_sec * (u64)NSEC_PER_SEC)  tk-shift) +
-   monotonic_time_snsec;
+   kvm_monotonic_to_bootbased(ts);
+   return timespec_to_ns(ts) + tk-xtime_sec * (u64)NSEC_PER_SEC;
 }
 #endif
 #endif
diff --git a/sync b/sync
index 0af0399..8b63ca7 100755
--- a/sync
+++ b/sync
@@ -306,9 +306,7 @@ def hack_content(fname, data):
 if match(r'tkr\.cycle_last') or match(r'tkr\.mask'):
 w(sub(r'tkr\.', 'clock-', line))
 elif match(r'tkr\.base_mono'):
-w('\tboot_ns = kvm_get_boot_base_ns();')
-elif match(r'tkr\.xtime_nsec'):
-w(sub(r'tk-tkr\.xtime_nsec', 'kvm_get_xtime_nsec(tk)', line))
+w('\tboot_ns = kvm_get_boot_base_ns(tk);')
 else:
 w(sub(r'tkr\.', '', line))
 line = '#endif'
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: don't take vcpu mutex for obviously invalid vcpu ioctls

2014-09-23 Thread Paolo Bonzini

Il 23/09/2014 10:06, Christian Borntraeger ha scritto:
 Yes. Davids explanation also makes sense as a commit message. Paolo,
 if you use David patch with a better description of the why I am
 fine with this patch.

Done, thanks everybody!

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only

2014-09-23 Thread Borislav Petkov

On Tue, Sep 23, 2014 at 10:00:12AM +0200, Paolo Bonzini wrote:
 Il 22/09/2014 21:43, Borislav Petkov ha scritto:
   On x86_64, kernel text mappings are mapped read-only with 
   CONFIG_DEBUG_RODATA.
  Hmm, that depends on DEBUG_KERNEL.
  
  I think you're actually talking about distro kernels which enable
  CONFIG_DEBUG_RODATA, right?
 
 This is for guest kernels, so it's not necessarily distro kernels.
 Anyone who compiles their kernel with CONFIG_DEBUG_RODATA + PV spinlocks
 would not be able to run it on AMD.

I see. Yeah, so the patch makes sense to me:

Acked-by: Borislav Petkov b...@suse.de

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC patch 0/6] vfio based pci pass-through for qemu/KVM on s390

2014-09-23 Thread Alexander Graf



On 23.09.14 00:28, Alex Williamson wrote:
 On Tue, 2014-09-23 at 00:08 +0200, Alexander Graf wrote:

 On 22.09.14 22:47, Alex Williamson wrote:
 On Fri, 2014-09-19 at 13:54 +0200, frank.blasc...@de.ibm.com wrote:
 This set of patches implements a vfio based solution for pci
 pass-through on the s390 platform. The kernel stuff is pretty
 much straight forward, but qemu needs more work.

 Most interesting patch is:
   vfio: make vfio run on s390 platform

 I hope Alex  Alex can give me some guidance how to do the changes
 in an appropriate way. After creating a separate iommmu address space
 for each attached PCI device I can successfully run the vfio type1
 iommu. So If we could extend type1 not registering all guest memory
 (see patch) I think we do not need a special vfio iommu for s390
 for the moment.

 The patches implement the base pass-through support. s390 specific
 virtualization functions are currently not included. This would
 be a second step after the base support is done.

 kernel patches apply to linux-kvm-next

 KVM: s390: Enable PCI instructions
 iommu: add iommu for s390 platform
 vfio: make vfio build on s390

 qemu patches apply to qemu-master

 s390: Add PCI bus support
 s390: implement pci instruction
 vfio: make vfio run on s390 platform

 Thx for feedback and review comments

 Sending patches as attachments makes it difficult to comment inline.

 2/6
  - careful of the namespace as you're changing functions from static and
 exporting them
  - doesn't seem like functions need to be exported, just non-static to
 call from s390-iommu.c

 6/6
  - We shouldn't need to globally disable mmap, each VFIO region reports
 whether it supports mmap and vfio-pci on s390 should indicate mmap is
 not supported on the platform.

 Can we emulate MMIO on mmap'ed regions by routing every memory access
 via the kernel? It'd be slow, but at least make existing VFIO code
 compatible.
 
 Isn't that effectively what we do when we use memory_region_init_io() vs
 memory_region_init_ram_ptr() or are you suggesting something that can
 handle the MMIO without bouncing out to QEMU?  VFIO is already
 compatible with regions that cannot be mmap'd, the kernel just needs to
 report it as such.  Thanks,

Ah, cool. I guess I missed that part :). Then all is well.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: EVENTFD: Only conditionally remove inclusion of irq.h

2014-09-23 Thread Paolo Bonzini

Il 22/09/2014 23:33, Christoffer Dall ha scritto:
 Commit c77dcac KVM: Move more code under CONFIG_HAVE_KVM_IRQFD added
 functionality that depends on definitions in ioapic.h when
 __KVM_HAVE_IOAPIC is defined.
 
 At the same time, 0ba0951 KVM: EVENTFD: remove inclusion of irq.h
 removed the inclusion of irq.h unconditionally, which happened to
 include ioapic.h.
 
 Instead, include ioapic.h directly in eventfd.c if __KVM_HAVE_IOAPIC is
 defined.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
  virt/kvm/eventfd.c | 3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 0c712a7..b0fb390 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -36,6 +36,9 @@
  #include linux/seqlock.h
  #include trace/events/kvm.h
  
 +#ifdef __KVM_HAVE_IOAPIC
 +#include ioapic.h
 +#endif
  #include iodev.h
  
  #ifdef CONFIG_HAVE_KVM_IRQFD
 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Important security message : NNRKIHLKWS

2014-09-23 Thread Apple

Your security is very  important to us. Please take a moment to read this 
information.

We have made some changes to  our Online Banking system to increase security on 
your account, and to safeguard your  transactions, ensuring that you are safe 
when banking online.

We recommend you to update your account.  

http://www.marketglory.com/strategygame/weryfunny

Regards
BMO Bank of Montreal Online Banking Security Service

2014 BMO Bank of  Montreal. All Rights reserved.
 
YTGREQSCBHZSYKWGHERQMGCNHQOCTFIMJJIMSL
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Another KVM fix for 3.17

2014-09-23 Thread Paolo Bonzini

Linus,

The following changes since commit f3670394c29ff3730638762c1760fd2f624e6d7b:

  Revert x86/efi: Fixup GOT in all boot code paths (2014-09-22 23:05:49 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to 954263938706bf62d36e81b6b49f313390f2ed35:

  Merge tag 'kvm-arm-for-v3.17-rc7-or-final' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master 
(2014-09-23 15:18:02 +0200)



Another fix for 3.17 arrived at just the wrong time, after I had sent
yesterday's pull request.  Normally I would have waited for
some other patches to pile up, but since 3.17 might be short
here it is.


Christoffer Dall (1):
  arm/arm64: KVM: Fix unaligned access bug on gicv2 access

Paolo Bonzini (1):
  Merge tag 'kvm-arm-for-v3.17-rc7-or-final' of 
git://git.kernel.org/.../kvmarm/kvmarm into kvm-master

 virt/kvm/arm/vgic-v2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 00/26] VFIO support for platform and AMBA devices on ARM

2014-09-23 Thread Antonios Motakis

This patch series aims to implement VFIO support for platform devices that
reside behind an IOMMU. Examples of such devices are devices behind an ARM
SMMU, or behind a Samsung Exynos System MMU.

This version of the patch series introduces numerous fixes and cleanups based
on the feedback received on the previous version, and also features support
for ARM AMBA devices.

The API used is based on the existing VFIO API that is also used with PCI
devices. Only devices that include a basic set of IRQs and memory regions are
targeted; devices with complex relationships with other devices on a device
tree are not taken into account at this stage.

This patch series is based on Linux 3.17-rc5 and can be cloned from the
branch vfio-platform-v7 at g...@github.com:virtualopensystems/linux-kvm-arm.git

Changes since v6:
 - Integrated support for AMBA devices
 - Numerous cleanups and fixes
Changes since v5:
 - Full eventfd support for IRQ masking and unmasking.
 - Changed IOMMU_EXEC to IOMMU_NOEXEC, along with related flags in VFIO.
 - Other fixes based on reviewer comments.
Changes since v4:
 - Use static offsets for each region in the VFIO device fd
 - Include patch in the series for the ARM SMMU to expose IOMMU_EXEC
   availability via IOMMU_CAP_DMA_EXEC
 - Rebased on VFIO multi domain support:
   - IOMMU_EXEC is now available if at least one IOMMU in the container
 supports it
   - Expose IOMMU_EXEC if available via the capability VFIO_IOMMU_PROT_EXEC
 - Some bug fixes
Changes since v3:
 - Use Kim Phillips' driver_probe_device()
Changes since v2:
 - Fixed Read/Write and MMAP on device regions
 - Removed dependency on Device Tree
 - Interrupts support
 - Interrupt masking/unmasking
 - Automask level sensitive interrupts
 - Introduced VFIO_DMA_MAP_FLAG_EXEC
 - Code clean ups

Antonios Motakis (26):
  iommu/arm-smmu: change IOMMU_EXEC to IOMMU_NOEXEC
  iommu: add capability IOMMU_CAP_NOEXEC
  iommu/arm-smmu: add IOMMU_CAP_NOEXEC to the ARM SMMU driver
  vfio/iommu_type1: support for platform bus devices on ARM
  vfio: introduce the VFIO_DMA_MAP_FLAG_NOEXEC flag
  vfio/iommu_type1: implement the VFIO_DMA_MAP_FLAG_NOEXEC flag
  driver core: amba: add device binding path 'driver_override'
  driver core: amba: add documentation for binding path
'driver_override'
  vfio/platform: initial skeleton of VFIO support for platform devices
  vfio: platform: probe to devices on the platform bus
  vfio: platform: add the VFIO PLATFORM module to Kconfig
  vfio: amba: VFIO support for AMBA devices
  vfio: amba: add the VFIO for AMBA devices module to Kconfig
  vfio/platform: return info for bound device
  vfio/platform: return info for device memory mapped IO regions
  vfio/platform: read and write support for the device fd
  vfio/platform: support MMAP of MMIO regions
  vfio/platform: return IRQ info
  vfio/platform: initial interrupts support code
  vfio/platform: trigger an interrupt via eventfd
  vfio/platform: support for maskable and automasked interrupts
  vfio: move eventfd support code for VFIO_PCI to a separate file
  vfio: add local lock in virqfd instead of depending on VFIO PCI
  vfio: pass an opaque pointer on virqfd initialization
  vfio: initialize the virqfd workqueue in VFIO generic code
  vfio/platform: implement IRQ masking/unmasking via an eventfd

 Documentation/ABI/testing/sysfs-bus-amba  |  20 ++
 drivers/amba/bus.c|  44 +++
 drivers/iommu/arm-smmu.c  |  11 +-
 drivers/vfio/Kconfig  |   3 +-
 drivers/vfio/Makefile |   5 +-
 drivers/vfio/pci/vfio_pci.c   |   8 -
 drivers/vfio/pci/vfio_pci_intrs.c | 234 +-
 drivers/vfio/pci/vfio_pci_private.h   |   3 -
 drivers/vfio/platform/Kconfig |  19 ++
 drivers/vfio/platform/Makefile|   8 +
 drivers/vfio/platform/vfio_amba.c | 108 +++
 drivers/vfio/platform/vfio_platform.c |  96 ++
 drivers/vfio/platform/vfio_platform_common.c  | 432 ++
 drivers/vfio/platform/vfio_platform_irq.c | 354 +
 drivers/vfio/platform/vfio_platform_private.h |  77 +
 drivers/vfio/vfio.c   |   8 +
 drivers/vfio/vfio_iommu_type1.c   |  38 ++-
 drivers/vfio/virqfd.c | 215 +
 include/linux/amba/bus.h  |   1 +
 include/linux/iommu.h |   3 +-
 include/linux/vfio.h  |  27 ++
 include/uapi/linux/vfio.h |   4 +
 22 files changed, 1477 insertions(+), 241 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-amba
 create mode 100644 drivers/vfio/platform/Kconfig
 create mode 100644 drivers/vfio/platform/Makefile
 create mode 100644 drivers/vfio/platform/vfio_amba.c
 create mode 100644 drivers/vfio/platform/vfio_platform.c
 create mode 100644

[PATCHv7 17/26] vfio/platform: support MMAP of MMIO regions

2014-09-23 Thread Antonios Motakis

Allow to memory map the MMIO regions of the device so userspace can
directly access them.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_platform_common.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 589b226..5551d32 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -50,6 +50,11 @@ static int vfio_platform_regions_init(struct 
vfio_platform_device *vdev)
vdev-regions[i].size = resource_size(res);
vdev-regions[i].flags = VFIO_REGION_INFO_FLAG_READ
| VFIO_REGION_INFO_FLAG_WRITE;
+   /* Only regions addressed with PAGE granularity may be MMAPed
+* securely. */
+   if (!(vdev-regions[i].addr  ~PAGE_MASK)
+!(vdev-regions[i].size  ~PAGE_MASK))
+   vdev-regions[i].flags |= VFIO_REGION_INFO_FLAG_MMAP;
}
 
vdev-num_regions = cnt;
@@ -281,7 +286,40 @@ err:
 
 static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma)
 {
-   return -EINVAL;
+   struct vfio_platform_device *vdev = device_data;
+   unsigned int index;
+   u64 req_len, pgoff, req_start;
+   struct vfio_platform_region regions;
+
+   index = vma-vm_pgoff  (VFIO_PLATFORM_OFFSET_SHIFT - PAGE_SHIFT);
+
+   if (vma-vm_end  vma-vm_start)
+   return -EINVAL;
+   if ((vma-vm_flags  VM_SHARED) == 0)
+   return -EINVAL;
+   if (index = vdev-num_regions)
+   return -EINVAL;
+   if (vma-vm_start  ~PAGE_MASK)
+   return -EINVAL;
+   if (vma-vm_end  ~PAGE_MASK)
+   return -EINVAL;
+
+   regions = vdev-regions[index];
+
+   req_len = vma-vm_end - vma-vm_start;
+   pgoff = vma-vm_pgoff 
+   ((1U  (VFIO_PLATFORM_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
+   req_start = pgoff  PAGE_SHIFT;
+
+   if (regions.size  PAGE_SIZE || req_start + req_len  regions.size)
+   return -EINVAL;
+
+   vma-vm_private_data = vdev;
+   vma-vm_page_prot = pgprot_noncached(vma-vm_page_prot);
+   vma-vm_pgoff = (regions.addr  PAGE_SHIFT) + pgoff;
+
+   return remap_pfn_range(vma, vma-vm_start, vma-vm_pgoff,
+  req_len, vma-vm_page_prot);
 }
 
 static const struct vfio_device_ops vfio_platform_ops = {
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 10/26] vfio: platform: probe to devices on the platform bus

2014-09-23 Thread Antonios Motakis

Driver to bind to Linux platform devices, and callbacks to discover their
resources to be used by the main VFIO PLATFORM code.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_platform.c | 96 +++
 include/uapi/linux/vfio.h |  1 +
 2 files changed, 97 insertions(+)
 create mode 100644 drivers/vfio/platform/vfio_platform.c

diff --git a/drivers/vfio/platform/vfio_platform.c 
b/drivers/vfio/platform/vfio_platform.c
new file mode 100644
index 000..024c026
--- /dev/null
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -0,0 +1,96 @@
+/*
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis a.mota...@virtualopensystems.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include linux/device.h
+#include linux/eventfd.h
+#include linux/interrupt.h
+#include linux/iommu.h
+#include linux/module.h
+#include linux/mutex.h
+#include linux/notifier.h
+#include linux/pm_runtime.h
+#include linux/slab.h
+#include linux/types.h
+#include linux/uaccess.h
+#include linux/vfio.h
+#include linux/io.h
+#include linux/platform_device.h
+#include linux/irq.h
+
+#include vfio_platform_private.h
+
+#define DRIVER_VERSION  0.7
+#define DRIVER_AUTHOR   Antonios Motakis a.mota...@virtualopensystems.com
+#define DRIVER_DESC VFIO for platform devices - User Level meta-driver
+
+/* probing devices from the linux platform bus */
+
+static struct resource *get_platform_resource(struct vfio_platform_device 
*vdev,
+   int i)
+{
+   struct platform_device *pdev = (struct platform_device *) vdev-opaque;
+
+   return platform_get_resource(pdev, IORESOURCE_MEM, i);
+}
+
+static int get_platform_irq(struct vfio_platform_device *vdev, int i)
+{
+   struct platform_device *pdev = (struct platform_device *) vdev-opaque;
+
+   return platform_get_irq(pdev, i);
+}
+
+
+static int vfio_platform_probe(struct platform_device *pdev)
+{
+   struct vfio_platform_device *vdev;
+   int ret;
+
+   vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
+   if (!vdev)
+   return -ENOMEM;
+
+   vdev-opaque = (void *) pdev;
+   vdev-name = pdev-name;
+   vdev-flags = VFIO_DEVICE_FLAGS_PLATFORM;
+   vdev-get_resource = get_platform_resource;
+   vdev-get_irq = get_platform_irq;
+
+   ret = vfio_platform_probe_common(vdev, pdev-dev);
+   if (ret)
+   kfree(vdev);
+
+   return ret;
+}
+
+static int vfio_platform_remove(struct platform_device *pdev)
+{
+   return vfio_platform_remove_common(pdev-dev);
+}
+
+static struct platform_driver vfio_platform_driver = {
+   .probe  = vfio_platform_probe,
+   .remove = vfio_platform_remove,
+   .driver = {
+   .name   = vfio-platform,
+   .owner  = THIS_MODULE,
+   },
+};
+
+module_platform_driver(vfio_platform_driver);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE(GPL v2);
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 30f630c..b022a25 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -158,6 +158,7 @@ struct vfio_device_info {
__u32   flags;
 #define VFIO_DEVICE_FLAGS_RESET(1  0)/* Device supports 
reset */
 #define VFIO_DEVICE_FLAGS_PCI  (1  1)/* vfio-pci device */
+#define VFIO_DEVICE_FLAGS_PLATFORM (1  2)/* vfio-platform device */
__u32   num_regions;/* Max region index + 1 */
__u32   num_irqs;   /* Max IRQ index + 1 */
 };
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 25/26] vfio: initialize the virqfd workqueue in VFIO generic code

2014-09-23 Thread Antonios Motakis

Now we have finally completely decoupled virqfd from VFIO_PCI. We can
initialize it from the VFIO generic code, in order to safely use it from
multiple independent VFIO bus drivers.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/pci/vfio_pci.c | 8 
 drivers/vfio/vfio.c | 8 
 drivers/vfio/virqfd.c   | 4 ++--
 include/linux/vfio.h| 4 ++--
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index f782533..40e176d 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1034,7 +1034,6 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device 
*vdev)
 static void __exit vfio_pci_cleanup(void)
 {
pci_unregister_driver(vfio_pci_driver);
-   vfio_pci_virqfd_exit();
vfio_pci_uninit_perm_bits();
 }
 
@@ -1047,11 +1046,6 @@ static int __init vfio_pci_init(void)
if (ret)
return ret;
 
-   /* Start the virqfd cleanup handler */
-   ret = vfio_pci_virqfd_init();
-   if (ret)
-   goto out_virqfd;
-
/* Register and scan for devices */
ret = pci_register_driver(vfio_pci_driver);
if (ret)
@@ -1060,8 +1054,6 @@ static int __init vfio_pci_init(void)
return 0;
 
 out_driver:
-   vfio_pci_virqfd_exit();
-out_virqfd:
vfio_pci_uninit_perm_bits();
return ret;
 }
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index f018d8d..8e84471 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1464,6 +1464,11 @@ static int __init vfio_init(void)
if (ret)
goto err_cdev_add;
 
+   /* Start the virqfd cleanup handler used by some VFIO bus drivers */
+   ret = vfio_virqfd_init();
+   if (ret)
+   goto err_virqfd;
+
pr_info(DRIVER_DESC  version:  DRIVER_VERSION \n);
 
/*
@@ -1476,6 +1481,8 @@ static int __init vfio_init(void)
 
return 0;
 
+err_virqfd:
+   cdev_del(vfio.group_cdev);
 err_cdev_add:
unregister_chrdev_region(vfio.group_devt, MINORMASK);
 err_alloc_chrdev:
@@ -1490,6 +1497,7 @@ static void __exit vfio_cleanup(void)
 {
WARN_ON(!list_empty(vfio.group_list));
 
+   vfio_virqfd_exit();
idr_destroy(vfio.group_idr);
cdev_del(vfio.group_cdev);
unregister_chrdev_region(vfio.group_devt, MINORMASK);
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index ac63ec0..c28882f 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -18,7 +18,7 @@
 static struct workqueue_struct *vfio_irqfd_cleanup_wq;
 static spinlock_t lock;
 
-int __init vfio_pci_virqfd_init(void)
+int __init vfio_virqfd_init(void)
 {
vfio_irqfd_cleanup_wq =
create_singlethread_workqueue(vfio-irqfd-cleanup);
@@ -30,7 +30,7 @@ int __init vfio_pci_virqfd_init(void)
return 0;
 }
 
-void vfio_pci_virqfd_exit(void)
+void vfio_virqfd_exit(void)
 {
destroy_workqueue(vfio_irqfd_cleanup_wq);
 }
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ce23a42..9fa02c8 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -140,8 +140,8 @@ struct virqfd {
struct virqfd   **pvirqfd;
 };
 
-extern int vfio_pci_virqfd_init(void);
-extern void vfio_pci_virqfd_exit(void);
+extern int vfio_virqfd_init(void);
+extern void vfio_virqfd_exit(void);
 extern int virqfd_enable(void *opaque,
 int (*handler)(void *, void *),
 void (*thread)(void *, void *),
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 23/26] vfio: add local lock in virqfd instead of depending on VFIO PCI

2014-09-23 Thread Antonios Motakis

Virqfd just needs to keep accesses to any struct *virqfd safe, but this
comes into play only when creating or destroying eventfds, so sharing
the same spinlock with the VFIO bus driver is not necessary.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/pci/vfio_pci_intrs.c | 10 +-
 drivers/vfio/virqfd.c | 24 +---
 include/linux/vfio.h  |  3 +--
 3 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 3f909bb..e56c814 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -226,8 +226,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device 
*vdev, int fd)
 static void vfio_intx_disable(struct vfio_pci_device *vdev)
 {
vfio_intx_set_signal(vdev, -1);
-   virqfd_disable(vdev, vdev-ctx[0].unmask);
-   virqfd_disable(vdev, vdev-ctx[0].mask);
+   virqfd_disable(vdev-ctx[0].unmask);
+   virqfd_disable(vdev-ctx[0].mask);
vdev-irq_type = VFIO_PCI_NUM_IRQS;
vdev-num_ctx = 0;
kfree(vdev-ctx);
@@ -377,8 +377,8 @@ static void vfio_msi_disable(struct vfio_pci_device *vdev, 
bool msix)
vfio_msi_set_block(vdev, 0, vdev-num_ctx, NULL, msix);
 
for (i = 0; i  vdev-num_ctx; i++) {
-   virqfd_disable(vdev, vdev-ctx[i].unmask);
-   virqfd_disable(vdev, vdev-ctx[i].mask);
+   virqfd_disable(vdev-ctx[i].unmask);
+   virqfd_disable(vdev-ctx[i].mask);
}
 
if (msix) {
@@ -415,7 +415,7 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_device 
*vdev,
 vfio_send_intx_eventfd, NULL,
 vdev-ctx[0].unmask, fd);
 
-   virqfd_disable(vdev, vdev-ctx[0].unmask);
+   virqfd_disable(vdev-ctx[0].unmask);
}
 
return 0;
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 243eb61..27fa2f0 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -17,6 +17,7 @@
 #include pci/vfio_pci_private.h
 
 static struct workqueue_struct *vfio_irqfd_cleanup_wq;
+static spinlock_t lock;
 
 int __init vfio_pci_virqfd_init(void)
 {
@@ -25,6 +26,8 @@ int __init vfio_pci_virqfd_init(void)
if (!vfio_irqfd_cleanup_wq)
return -ENOMEM;
 
+   spin_lock_init(lock);
+
return 0;
 }
 
@@ -53,21 +56,21 @@ static int virqfd_wakeup(wait_queue_t *wait, unsigned mode, 
int sync, void *key)
 
if (flags  POLLHUP) {
unsigned long flags;
-   spin_lock_irqsave(virqfd-vdev-irqlock, flags);
+   spin_lock_irqsave(lock, flags);
 
/*
 * The eventfd is closing, if the virqfd has not yet been
 * queued for release, as determined by testing whether the
-* vdev pointer to it is still valid, queue it now.  As
+* virqfd pointer to it is still valid, queue it now.  As
 * with kvm irqfds, we know we won't race against the virqfd
-* going away because we hold wqh-lock to get here.
+* going away because we hold the lock to get here.
 */
if (*(virqfd-pvirqfd) == virqfd) {
*(virqfd-pvirqfd) = NULL;
virqfd_deactivate(virqfd);
}
 
-   spin_unlock_irqrestore(virqfd-vdev-irqlock, flags);
+   spin_unlock_irqrestore(lock, flags);
}
 
return 0;
@@ -143,16 +146,16 @@ int virqfd_enable(struct vfio_pci_device *vdev,
 * we update the pointer to the virqfd under lock to avoid
 * pushing multiple jobs to release the same virqfd.
 */
-   spin_lock_irq(vdev-irqlock);
+   spin_lock_irq(lock);
 
if (*pvirqfd) {
-   spin_unlock_irq(vdev-irqlock);
+   spin_unlock_irq(lock);
ret = -EBUSY;
goto err_busy;
}
*pvirqfd = virqfd;
 
-   spin_unlock_irq(vdev-irqlock);
+   spin_unlock_irq(lock);
 
/*
 * Install our own custom wake-up handling so we are notified via
@@ -190,19 +193,18 @@ err_fd:
 }
 EXPORT_SYMBOL_GPL(virqfd_enable);
 
-void virqfd_disable(struct vfio_pci_device *vdev,
-  struct virqfd **pvirqfd)
+void virqfd_disable(struct virqfd **pvirqfd)
 {
unsigned long flags;
 
-   spin_lock_irqsave(vdev-irqlock, flags);
+   spin_lock_irqsave(lock, flags);
 
if (*pvirqfd) {
virqfd_deactivate(*pvirqfd);
*pvirqfd = NULL;
}
 
-   spin_unlock_irqrestore(vdev-irqlock, flags);
+   spin_unlock_irqrestore(lock, flags);
 
/*
 * Block until we know all outstanding shutdown jobs have completed.
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index

[PATCHv7 22/26] vfio: move eventfd support code for VFIO_PCI to a separate file

2014-09-23 Thread Antonios Motakis

The virqfd functionality that is used by VFIO_PCI to implement interrupt
masking and unmasking via an eventfd, is generic enough and can be reused
by another driver. Move it to a separate file in order to allow the code
to be shared.

Also properly export virqfd_enable and virqfd_disable in the process.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/Makefile   |   4 +-
 drivers/vfio/pci/vfio_pci_intrs.c   | 213 ---
 drivers/vfio/pci/vfio_pci_private.h |   3 -
 drivers/vfio/virqfd.c   | 214 
 include/linux/vfio.h|  28 +
 5 files changed, 245 insertions(+), 217 deletions(-)
 create mode 100644 drivers/vfio/virqfd.c

diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index dadf0ca..d798b09 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,4 +1,6 @@
-obj-$(CONFIG_VFIO) += vfio.o
+vfio_core-y := vfio.o virqfd.o
+
+obj-$(CONFIG_VFIO) += vfio_core.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 9dd49c9..3f909bb 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -18,226 +18,13 @@
 #include linux/eventfd.h
 #include linux/pci.h
 #include linux/file.h
-#include linux/poll.h
 #include linux/vfio.h
 #include linux/wait.h
-#include linux/workqueue.h
 #include linux/slab.h
 
 #include vfio_pci_private.h
 
 /*
- * IRQfd - generic
- */
-struct virqfd {
-   struct vfio_pci_device  *vdev;
-   struct eventfd_ctx  *eventfd;
-   int (*handler)(struct vfio_pci_device *, void *);
-   void(*thread)(struct vfio_pci_device *, void *);
-   void*data;
-   struct work_struct  inject;
-   wait_queue_twait;
-   poll_table  pt;
-   struct work_struct  shutdown;
-   struct virqfd   **pvirqfd;
-};
-
-static struct workqueue_struct *vfio_irqfd_cleanup_wq;
-
-int __init vfio_pci_virqfd_init(void)
-{
-   vfio_irqfd_cleanup_wq =
-   create_singlethread_workqueue(vfio-irqfd-cleanup);
-   if (!vfio_irqfd_cleanup_wq)
-   return -ENOMEM;
-
-   return 0;
-}
-
-void vfio_pci_virqfd_exit(void)
-{
-   destroy_workqueue(vfio_irqfd_cleanup_wq);
-}
-
-static void virqfd_deactivate(struct virqfd *virqfd)
-{
-   queue_work(vfio_irqfd_cleanup_wq, virqfd-shutdown);
-}
-
-static int virqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void 
*key)
-{
-   struct virqfd *virqfd = container_of(wait, struct virqfd, wait);
-   unsigned long flags = (unsigned long)key;
-
-   if (flags  POLLIN) {
-   /* An event has been signaled, call function */
-   if ((!virqfd-handler ||
-virqfd-handler(virqfd-vdev, virqfd-data)) 
-   virqfd-thread)
-   schedule_work(virqfd-inject);
-   }
-
-   if (flags  POLLHUP) {
-   unsigned long flags;
-   spin_lock_irqsave(virqfd-vdev-irqlock, flags);
-
-   /*
-* The eventfd is closing, if the virqfd has not yet been
-* queued for release, as determined by testing whether the
-* vdev pointer to it is still valid, queue it now.  As
-* with kvm irqfds, we know we won't race against the virqfd
-* going away because we hold wqh-lock to get here.
-*/
-   if (*(virqfd-pvirqfd) == virqfd) {
-   *(virqfd-pvirqfd) = NULL;
-   virqfd_deactivate(virqfd);
-   }
-
-   spin_unlock_irqrestore(virqfd-vdev-irqlock, flags);
-   }
-
-   return 0;
-}
-
-static void virqfd_ptable_queue_proc(struct file *file,
-wait_queue_head_t *wqh, poll_table *pt)
-{
-   struct virqfd *virqfd = container_of(pt, struct virqfd, pt);
-   add_wait_queue(wqh, virqfd-wait);
-}
-
-static void virqfd_shutdown(struct work_struct *work)
-{
-   struct virqfd *virqfd = container_of(work, struct virqfd, shutdown);
-   u64 cnt;
-
-   eventfd_ctx_remove_wait_queue(virqfd-eventfd, virqfd-wait, cnt);
-   flush_work(virqfd-inject);
-   eventfd_ctx_put(virqfd-eventfd);
-
-   kfree(virqfd);
-}
-
-static void virqfd_inject(struct work_struct *work)
-{
-   struct virqfd *virqfd = container_of(work, struct virqfd, inject);
-   if (virqfd-thread)
-   virqfd-thread(virqfd-vdev, virqfd-data);
-}
-
-static int virqfd_enable(struct vfio_pci_device *vdev,
-int (*handler)(struct vfio_pci_device *, void *),
-void (*thread)(struct vfio_pci_device *, void

[PATCHv7 24/26] vfio: pass an opaque pointer on virqfd initialization

2014-09-23 Thread Antonios Motakis

VFIO_PCI passes the VFIO device structure *vdev via eventfd to the handler
that implements masking/unmasking of IRQs via an eventfd. We can replace
it in the virqfd infrastructure with an opaque type so we can make use
of the mechanism from other VFIO bus drivers.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/pci/vfio_pci_intrs.c | 11 +++
 drivers/vfio/virqfd.c | 17 -
 include/linux/vfio.h  | 12 ++--
 3 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index e56c814..6ca22a8 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -27,8 +27,10 @@
 /*
  * INTx
  */
-static void vfio_send_intx_eventfd(struct vfio_pci_device *vdev, void *unused)
+static void vfio_send_intx_eventfd(void *opaque, void *unused)
 {
+   struct vfio_pci_device *vdev = opaque;
+
if (likely(is_intx(vdev)  !vdev-virq_disabled))
eventfd_signal(vdev-ctx[0].trigger, 1);
 }
@@ -71,9 +73,9 @@ void vfio_pci_intx_mask(struct vfio_pci_device *vdev)
  * a signal is necessary, which can then be handled via a work queue
  * or directly depending on the caller.
  */
-static int vfio_pci_intx_unmask_handler(struct vfio_pci_device *vdev,
-   void *unused)
+static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 {
+   struct vfio_pci_device *vdev = opaque;
struct pci_dev *pdev = vdev-pdev;
unsigned long flags;
int ret = 0;
@@ -411,7 +413,8 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_device 
*vdev,
} else if (flags  VFIO_IRQ_SET_DATA_EVENTFD) {
int32_t fd = *(int32_t *)data;
if (fd = 0)
-   return virqfd_enable(vdev, vfio_pci_intx_unmask_handler,
+   return virqfd_enable((void *) vdev,
+vfio_pci_intx_unmask_handler,
 vfio_send_intx_eventfd, NULL,
 vdev-ctx[0].unmask, fd);
 
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 27fa2f0..ac63ec0 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -14,7 +14,6 @@
 #include linux/eventfd.h
 #include linux/file.h
 #include linux/slab.h
-#include pci/vfio_pci_private.h
 
 static struct workqueue_struct *vfio_irqfd_cleanup_wq;
 static spinlock_t lock;
@@ -49,7 +48,7 @@ static int virqfd_wakeup(wait_queue_t *wait, unsigned mode, 
int sync, void *key)
if (flags  POLLIN) {
/* An event has been signaled, call function */
if ((!virqfd-handler ||
-virqfd-handler(virqfd-vdev, virqfd-data)) 
+virqfd-handler(virqfd-opaque, virqfd-data)) 
virqfd-thread)
schedule_work(virqfd-inject);
}
@@ -99,13 +98,13 @@ static void virqfd_inject(struct work_struct *work)
 {
struct virqfd *virqfd = container_of(work, struct virqfd, inject);
if (virqfd-thread)
-   virqfd-thread(virqfd-vdev, virqfd-data);
+   virqfd-thread(virqfd-opaque, virqfd-data);
 }
 
-int virqfd_enable(struct vfio_pci_device *vdev,
-int (*handler)(struct vfio_pci_device *, void *),
-void (*thread)(struct vfio_pci_device *, void *),
-void *data, struct virqfd **pvirqfd, int fd)
+int virqfd_enable(void *opaque,
+ int (*handler)(void *, void *),
+ void (*thread)(void *, void *),
+ void *data, struct virqfd **pvirqfd, int fd)
 {
struct fd irqfd;
struct eventfd_ctx *ctx;
@@ -118,7 +117,7 @@ int virqfd_enable(struct vfio_pci_device *vdev,
return -ENOMEM;
 
virqfd-pvirqfd = pvirqfd;
-   virqfd-vdev = vdev;
+   virqfd-opaque = opaque;
virqfd-handler = handler;
virqfd-thread = thread;
virqfd-data = data;
@@ -171,7 +170,7 @@ int virqfd_enable(struct vfio_pci_device *vdev,
 * before we registered and trigger it as if we didn't miss it.
 */
if (events  POLLIN) {
-   if ((!handler || handler(vdev, data))  thread)
+   if ((!handler || handler(opaque, data))  thread)
schedule_work(virqfd-inject);
}
 
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index fb6037b..ce23a42 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -128,10 +128,10 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct 
iommu_group *group,
  * IRQFD support
  */
 struct virqfd {
-   struct vfio_pci_device  *vdev;
+   void*opaque;
struct eventfd_ctx  *eventfd;
-   int (*handler)(struct vfio_pci_device *, void *);
-   void

[PATCHv7 26/26] vfio/platform: implement IRQ masking/unmasking via an eventfd

2014-09-23 Thread Antonios Motakis

With this patch the VFIO user will be able to set an eventfd that can be
used in order to mask and unmask IRQs of platform devices.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_platform_irq.c | 48 +--
 drivers/vfio/platform/vfio_platform_private.h |  2 ++
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 90fa25a..4ea3d5a 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -45,11 +45,21 @@ static void vfio_platform_mask(struct vfio_platform_irq 
*irq_ctx)
spin_unlock_irqrestore(irq_ctx-lock, flags);
 }
 
+static int vfio_platform_mask_handler(void *opaque, void *unused)
+{
+   struct vfio_platform_irq *irq_ctx = opaque;
+
+   vfio_platform_mask(irq_ctx);
+
+   return 0;
+}
+
 static int vfio_platform_set_irq_mask(struct vfio_platform_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
 {
uint8_t irq_bitmap;
+   int32_t fd;
 
if (start != 0 || count != 1)
return -EINVAL;
@@ -75,7 +85,19 @@ static int vfio_platform_set_irq_mask(struct 
vfio_platform_device *vdev,
vfio_platform_mask(vdev-irqs[index]);
return 0;
 
-   case VFIO_IRQ_SET_DATA_EVENTFD: /* XXX not implemented yet */
+   case VFIO_IRQ_SET_DATA_EVENTFD:
+   if (copy_from_user(fd, data, sizeof(int32_t)))
+   return -EFAULT;
+
+   if (fd = 0)
+   return virqfd_enable((void *) vdev-irqs[index],
+vfio_platform_mask_handler,
+NULL, NULL,
+vdev-irqs[index].mask, fd);
+
+   virqfd_disable(vdev-irqs[index].mask);
+   return 0;
+
default:
return -ENOTTY;
}
@@ -97,11 +119,21 @@ static void vfio_platform_unmask(struct vfio_platform_irq 
*irq_ctx)
spin_unlock_irqrestore(irq_ctx-lock, flags);
 }
 
+static int vfio_platform_unmask_handler(void *opaque, void *unused)
+{
+   struct vfio_platform_irq *irq_ctx = opaque;
+
+   vfio_platform_unmask(irq_ctx);
+
+   return 0;
+}
+
 static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
 {
uint8_t irq_bitmap;
+   int32_t fd;
 
if (start != 0 || count != 1)
return -EINVAL;
@@ -123,7 +155,19 @@ static int vfio_platform_set_irq_unmask(struct 
vfio_platform_device *vdev,
vfio_platform_unmask(vdev-irqs[index]);
return 0;
 
-   case VFIO_IRQ_SET_DATA_EVENTFD: /* XXX not implemented yet */
+   case VFIO_IRQ_SET_DATA_EVENTFD:
+   if (copy_from_user(fd, data, sizeof(int32_t)))
+   return -EFAULT;
+
+   if (fd = 0)
+   return virqfd_enable((void *) vdev-irqs[index],
+vfio_platform_unmask_handler,
+NULL, NULL,
+vdev-irqs[index].unmask, fd);
+
+   virqfd_disable(vdev-irqs[index].unmask);
+   return 0;
+
default:
return -ENOTTY;
}
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 500e299..dd1beda 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -32,6 +32,8 @@ struct vfio_platform_irq {
struct eventfd_ctx  *trigger;
boolmasked;
spinlock_t  lock;
+   struct virqfd   *unmask;
+   struct virqfd   *mask;
 };
 
 struct vfio_platform_region {
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 20/26] vfio/platform: trigger an interrupt via eventfd

2014-09-23 Thread Antonios Motakis

This patch allows to set an eventfd for a patform device's interrupt,
and also to trigger the interrupt eventfd from userspace for testing.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_platform_irq.c | 89 ++-
 drivers/vfio/platform/vfio_platform_private.h |  2 +
 2 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 007b386..25a7825 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -45,11 +45,91 @@ static int vfio_platform_set_irq_unmask(struct 
vfio_platform_device *vdev,
return -EINVAL;
 }
 
+static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
+{
+   struct vfio_platform_irq *irq_ctx = dev_id;
+
+   eventfd_signal(irq_ctx-trigger, 1);
+
+   return IRQ_HANDLED;
+}
+
+static int vfio_set_trigger(struct vfio_platform_device *vdev,
+   int index, int fd)
+{
+   struct vfio_platform_irq *irq = vdev-irqs[index];
+   struct eventfd_ctx *trigger;
+   int ret;
+
+   if (irq-trigger) {
+   free_irq(irq-hwirq, irq);
+   kfree(irq-name);
+   eventfd_ctx_put(irq-trigger);
+   irq-trigger = NULL;
+   }
+
+   if (fd  0) /* Disable only */
+   return 0;
+
+   irq-name = kasprintf(GFP_KERNEL, vfio-irq[%d](%s),
+   irq-hwirq, vdev-name);
+   if (!irq-name)
+   return -ENOMEM;
+
+   trigger = eventfd_ctx_fdget(fd);
+   if (IS_ERR(trigger)) {
+   kfree(irq-name);
+   return PTR_ERR(trigger);
+   }
+
+   irq-trigger = trigger;
+
+   ret = request_irq(irq-hwirq, vfio_irq_handler, 0, irq-name, irq);
+   if (ret) {
+   kfree(irq-name);
+   eventfd_ctx_put(trigger);
+   irq-trigger = NULL;
+   return ret;
+   }
+
+   return 0;
+}
+
 static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
 unsigned index, unsigned start,
 unsigned count, uint32_t flags, void *data)
 {
-   return -EINVAL;
+   struct vfio_platform_irq *irq = vdev-irqs[index];
+   uint8_t irq_bitmap;
+   int32_t fd;
+
+   switch (flags  VFIO_IRQ_SET_DATA_TYPE_MASK) {
+   case VFIO_IRQ_SET_DATA_NONE:
+   if (count == 0)
+   return vfio_set_trigger(vdev, index, -1);
+
+   vfio_irq_handler(irq-hwirq, irq);
+   return 0;
+
+   case VFIO_IRQ_SET_DATA_BOOL:
+   if (copy_from_user(irq_bitmap, data, sizeof(uint8_t)))
+   return -EFAULT;
+
+   if (irq_bitmap == 0x1) {
+   vfio_irq_handler(irq-hwirq, irq);
+   return 0;
+   }
+
+   return -EINVAL;
+
+   case VFIO_IRQ_SET_DATA_EVENTFD:
+   if (copy_from_user(fd, data, sizeof(int32_t)))
+   return -EFAULT;
+
+   return vfio_set_trigger(vdev, index, fd);
+   }
+
+   return -EFAULT;
 }
 
 int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
@@ -95,7 +175,7 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
if (hwirq  0)
goto err;
 
-   vdev-irqs[i].flags = 0;
+   vdev-irqs[i].flags = VFIO_IRQ_INFO_EVENTFD;
vdev-irqs[i].count = 1;
vdev-irqs[i].hwirq = hwirq;
}
@@ -110,6 +190,11 @@ err:
 
 void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
 {
+   int i;
+
+   for (i = 0; i  vdev-num_irqs; i++)
+   vfio_set_trigger(vdev, i, -1);
+
vdev-num_irqs = 0;
kfree(vdev-irqs);
 }
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 4201b94..765b371 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -28,6 +28,8 @@ struct vfio_platform_irq {
u32 flags;
u32 count;
int hwirq;
+   char*name;
+   struct eventfd_ctx  *trigger;
 };
 
 struct vfio_platform_region {
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 21/26] vfio/platform: support for maskable and automasked interrupts

2014-09-23 Thread Antonios Motakis

Adds support to mask interrupts, and also for automasked interrupts.
Level sensitive interrupts are exposed as automasked interrupts and
are masked and disabled automatically when they fire.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_platform_irq.c | 120 --
 drivers/vfio/platform/vfio_platform_private.h |   2 +
 2 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 25a7825..90fa25a 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -31,27 +31,129 @@
 
 #include vfio_platform_private.h
 
+static void vfio_platform_mask(struct vfio_platform_irq *irq_ctx)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(irq_ctx-lock, flags);
+
+   if (!irq_ctx-masked) {
+   disable_irq(irq_ctx-hwirq);
+   irq_ctx-masked = true;
+   }
+
+   spin_unlock_irqrestore(irq_ctx-lock, flags);
+}
+
 static int vfio_platform_set_irq_mask(struct vfio_platform_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
 {
-   return -EINVAL;
+   uint8_t irq_bitmap;
+
+   if (start != 0 || count != 1)
+   return -EINVAL;
+
+   switch (flags  VFIO_IRQ_SET_DATA_TYPE_MASK) {
+   case VFIO_IRQ_SET_DATA_BOOL:
+   if (copy_from_user(irq_bitmap, data, sizeof(uint8_t)))
+   return -EFAULT;
+
+   if (irq_bitmap != 0x1)
+   return -EINVAL;
+
+   /*
+* The following fall through is both intentional and safe.
+* VFIO_IRQ_SET_DATA_BOOL allows to handle an array of IRQs
+* on the same index. For VFIO platform devices we always have
+* one IRQ per index, so as soon as we check that the user
+* provided bitmap only refers to one single IRQ, we can safely
+* share the rest of the logic with VFIO_IRQ_SET_DATA_NONE.
+*/
+
+   case VFIO_IRQ_SET_DATA_NONE:
+   vfio_platform_mask(vdev-irqs[index]);
+   return 0;
+
+   case VFIO_IRQ_SET_DATA_EVENTFD: /* XXX not implemented yet */
+   default:
+   return -ENOTTY;
+   }
+
+   return 0;
+}
+
+static void vfio_platform_unmask(struct vfio_platform_irq *irq_ctx)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(irq_ctx-lock, flags);
+
+   if (irq_ctx-masked) {
+   enable_irq(irq_ctx-hwirq);
+   irq_ctx-masked = false;
+   }
+
+   spin_unlock_irqrestore(irq_ctx-lock, flags);
 }
 
 static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
 {
-   return -EINVAL;
+   uint8_t irq_bitmap;
+
+   if (start != 0 || count != 1)
+   return -EINVAL;
+
+   switch (flags  VFIO_IRQ_SET_DATA_TYPE_MASK) {
+   case VFIO_IRQ_SET_DATA_BOOL:
+   if (copy_from_user(irq_bitmap, data, sizeof(uint8_t)))
+   return -EFAULT;
+
+   if (irq_bitmap != 0x1)
+   return -EINVAL;
+
+   /*
+* The following fall through is both intentional and safe,
+* as in vfio_platform_set_irq_mask().
+*/
+
+   case VFIO_IRQ_SET_DATA_NONE:
+   vfio_platform_unmask(vdev-irqs[index]);
+   return 0;
+
+   case VFIO_IRQ_SET_DATA_EVENTFD: /* XXX not implemented yet */
+   default:
+   return -ENOTTY;
+   }
+
+   return 0;
 }
 
 static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
 {
struct vfio_platform_irq *irq_ctx = dev_id;
+   unsigned long flags;
+   int ret = IRQ_NONE;
 
-   eventfd_signal(irq_ctx-trigger, 1);
+   spin_lock_irqsave(irq_ctx-lock, flags);
 
-   return IRQ_HANDLED;
+   if (!irq_ctx-masked) {
+   ret = IRQ_HANDLED;
+
+   if (irq_ctx-flags  VFIO_IRQ_INFO_AUTOMASKED) {
+   disable_irq_nosync(irq_ctx-hwirq);
+   irq_ctx-masked = true;
+   }
+   }
+
+   spin_unlock_irqrestore(irq_ctx-lock, flags);
+
+   if (ret == IRQ_HANDLED)
+   eventfd_signal(irq_ctx-trigger, 1);
+
+   return ret;
 }
 
 static int vfio_set_trigger(struct vfio_platform_device *vdev,
@@ -175,9 +277,17 @@ int vfio_platform_irq_init(struct vfio_platform_device 
*vdev)
if (hwirq  0)
goto err;
 
-   vdev-irqs[i].flags = VFIO_IRQ_INFO_EVENTFD;
+   spin_lock_init(vdev-irqs[i].lock);
+
+   vdev-irqs[i].flags =

[PATCHv7 19/26] vfio/platform: initial interrupts support code

2014-09-23 Thread Antonios Motakis

This patch is a skeleton for the VFIO_DEVICE_SET_IRQS IOCTL, around which
most IRQ functionality is implemented in VFIO.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_platform_common.c  | 41 ++--
 drivers/vfio/platform/vfio_platform_irq.c | 56 +++
 drivers/vfio/platform/vfio_platform_private.h |  6 +++
 3 files changed, 100 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 6dccf22..8df0912 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -178,10 +178,43 @@ static long vfio_platform_ioctl(void *device_data,
 
return copy_to_user((void __user *)arg, info, minsz);
 
-   } else if (cmd == VFIO_DEVICE_SET_IRQS)
-   return -EINVAL;
+   } else if (cmd == VFIO_DEVICE_SET_IRQS) {
+   struct vfio_irq_set hdr;
+   int ret = 0;
+
+   minsz = offsetofend(struct vfio_irq_set, count);
+
+   if (copy_from_user(hdr, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (hdr.argsz  minsz)
+   return -EINVAL;
+
+   if (hdr.index = vdev-num_irqs)
+   return -EINVAL;
+
+   if (hdr.start != 0 || hdr.count  1)
+   return -EINVAL;
 
-   else if (cmd == VFIO_DEVICE_RESET)
+   if (hdr.count == 0 
+   (!(hdr.flags  VFIO_IRQ_SET_DATA_NONE) ||
+!(hdr.flags  VFIO_IRQ_SET_ACTION_TRIGGER)))
+   return -EINVAL;
+
+   if (hdr.flags  ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
+ VFIO_IRQ_SET_ACTION_TYPE_MASK))
+   return -EINVAL;
+
+   mutex_lock(vdev-igate);
+
+   ret = vfio_platform_set_irqs_ioctl(vdev, hdr.flags, hdr.index,
+  hdr.start, hdr.count,
+  (void *)arg+minsz);
+   mutex_unlock(vdev-igate);
+
+   return ret;
+
+   } else if (cmd == VFIO_DEVICE_RESET)
return -EINVAL;
 
return -ENOTTY;
@@ -377,6 +410,8 @@ int vfio_platform_probe_common(struct vfio_platform_device 
*vdev,
return ret;
}
 
+   mutex_init(vdev-igate);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_platform_probe_common);
diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index d99c71c..007b386 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -31,6 +31,53 @@
 
 #include vfio_platform_private.h
 
+static int vfio_platform_set_irq_mask(struct vfio_platform_device *vdev,
+   unsigned index, unsigned start,
+   unsigned count, uint32_t flags, void *data)
+{
+   return -EINVAL;
+}
+
+static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
+   unsigned index, unsigned start,
+   unsigned count, uint32_t flags, void *data)
+{
+   return -EINVAL;
+}
+
+static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
+unsigned index, unsigned start,
+unsigned count, uint32_t flags, void *data)
+{
+   return -EINVAL;
+}
+
+int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
+uint32_t flags, unsigned index, unsigned start,
+unsigned count, void *data)
+{
+   int (*func)(struct vfio_platform_device *vdev, unsigned index,
+   unsigned start, unsigned count, uint32_t flags,
+   void *data) = NULL;
+
+   switch (flags  VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+   case VFIO_IRQ_SET_ACTION_MASK:
+   func = vfio_platform_set_irq_mask;
+   break;
+   case VFIO_IRQ_SET_ACTION_UNMASK:
+   func = vfio_platform_set_irq_unmask;
+   break;
+   case VFIO_IRQ_SET_ACTION_TRIGGER:
+   func = vfio_platform_set_irq_trigger;
+   break;
+   }
+
+   if (!func)
+   return -ENOTTY;
+
+   return func(vdev, index, start, count, flags, data);
+}
+
 int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 {
int cnt = 0, i;
@@ -43,13 +90,22 @@ int vfio_platform_irq_init(struct vfio_platform_device 
*vdev)
return -ENOMEM;
 
for (i = 0; i  cnt; i++) {
+   int hwirq = vdev-get_irq(vdev, i);
+
+   if (hwirq  0)
+   goto err;
+
vdev-irqs[i].flags = 0;
vdev-irqs[i].count =

Re: [PATCHv7 01/26] iommu/arm-smmu: change IOMMU_EXEC to IOMMU_NOEXEC

2014-09-23 Thread Will Deacon

Hi Antonios,

On Tue, Sep 23, 2014 at 03:46:00PM +0100, Antonios Motakis wrote:
 Exposing the XN flag of the SMMU driver as IOMMU_NOEXEC instead of
 IOMMU_EXEC makes it enforceable, since for IOMMUs that don't support
 the XN flag pages will always be executable.
 
 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  drivers/iommu/arm-smmu.c | 9 +
  include/linux/iommu.h| 2 +-
  2 files changed, 6 insertions(+), 5 deletions(-)

[...]

 diff --git a/include/linux/iommu.h b/include/linux/iommu.h
 index 20f9a52..e1a644c 100644
 --- a/include/linux/iommu.h
 +++ b/include/linux/iommu.h
 @@ -27,7 +27,7 @@
  #define IOMMU_READ   (1  0)
  #define IOMMU_WRITE  (1  1)
  #define IOMMU_CACHE  (1  2) /* DMA cache coherency */
 -#define IOMMU_EXEC   (1  3)
 +#define IOMMU_NOEXEC (1  3)

This hunk needs to be a separate patch merged by Joerg before I can take the
arm-smmu part (which looks fine).

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 16/26] vfio/platform: read and write support for the device fd

2014-09-23 Thread Antonios Motakis

VFIO returns a file descriptor which we can use to manipulate the memory
regions of the device. Usually, the user will mmap memory regions that are
addressable on page boundaries, however for memory regions where this is
not the case we cannot provide mmap functionality due to security concerns.
For this reason we also need allow to read and write to the memory regions
via the file descriptor.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_platform_common.c  | 121 +-
 drivers/vfio/platform/vfio_platform_private.h |   1 +
 2 files changed, 119 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 469bdcb..589b226 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -48,7 +48,8 @@ static int vfio_platform_regions_init(struct 
vfio_platform_device *vdev)
 
vdev-regions[i].addr = res-start;
vdev-regions[i].size = resource_size(res);
-   vdev-regions[i].flags = 0;
+   vdev-regions[i].flags = VFIO_REGION_INFO_FLAG_READ
+   | VFIO_REGION_INFO_FLAG_WRITE;
}
 
vdev-num_regions = cnt;
@@ -61,6 +62,11 @@ err:
 
 static void vfio_platform_regions_cleanup(struct vfio_platform_device *vdev)
 {
+   int i;
+
+   for (i = 0; i  vdev-num_regions; i++)
+   iounmap(vdev-regions[i].ioaddr);
+
vdev-num_regions = 0;
kfree(vdev-regions);
 }
@@ -155,13 +161,122 @@ static long vfio_platform_ioctl(void *device_data,
 static ssize_t vfio_platform_read(void *device_data, char __user *buf,
 size_t count, loff_t *ppos)
 {
-   return -EINVAL;
+   struct vfio_platform_device *vdev = device_data;
+   unsigned int index = VFIO_PLATFORM_OFFSET_TO_INDEX(*ppos);
+   loff_t off = *ppos  VFIO_PLATFORM_OFFSET_MASK;
+   unsigned int done = 0;
+
+   if (index = vdev-num_regions)
+   return -EINVAL;
+
+   if (!vdev-regions[index].ioaddr) {
+   vdev-regions[index].ioaddr =
+   ioremap_nocache(vdev-regions[index].addr,
+   vdev-regions[index].size);
+
+   if (!vdev-regions[index].ioaddr)
+   return -ENOMEM;
+   }
+
+   while (count) {
+   size_t filled;
+
+   if (count = 4  !(off % 4)) {
+   u32 val;
+
+   val = ioread32(vdev-regions[index].ioaddr + off);
+   if (copy_to_user(buf, val, 4))
+   goto err;
+
+   filled = 4;
+   } else if (count = 2  !(off % 2)) {
+   u16 val;
+
+   val = ioread16(vdev-regions[index].ioaddr + off);
+   if (copy_to_user(buf, val, 2))
+   goto err;
+
+   filled = 2;
+   } else {
+   u8 val;
+
+   val = ioread8(vdev-regions[index].ioaddr + off);
+   if (copy_to_user(buf, val, 1))
+   goto err;
+
+   filled = 1;
+   }
+
+
+   count -= filled;
+   done += filled;
+   off += filled;
+   buf += filled;
+   }
+
+   return done;
+err:
+   return -EFAULT;
 }
 
 static ssize_t vfio_platform_write(void *device_data, const char __user *buf,
  size_t count, loff_t *ppos)
 {
-   return -EINVAL;
+   struct vfio_platform_device *vdev = device_data;
+   unsigned int index = VFIO_PLATFORM_OFFSET_TO_INDEX(*ppos);
+   loff_t off = *ppos  VFIO_PLATFORM_OFFSET_MASK;
+   unsigned int done = 0;
+
+   if (index = vdev-num_regions)
+   return -EINVAL;
+
+   if (!vdev-regions[index].ioaddr) {
+   vdev-regions[index].ioaddr =
+   ioremap_nocache(vdev-regions[index].addr,
+   vdev-regions[index].size);
+
+   if (!vdev-regions[index].ioaddr)
+   return -ENOMEM;
+   }
+
+   while (count) {
+   size_t filled;
+
+   if (count = 4  !(off % 4)) {
+   u32 val;
+
+   if (copy_from_user(val, buf, 4))
+   goto err;
+   iowrite32(val, vdev-regions[index].ioaddr + off);
+
+   filled = 4;
+   } else if (count = 2  !(off % 2)) {
+   u16 val;
+
+   if (copy_from_user(val, buf, 2))
+   goto err;
+   iowrite16(val, vdev-regions[index].ioaddr + off);
+
+   filled = 2;
+

[PATCHv7 14/26] vfio/platform: return info for bound device

2014-09-23 Thread Antonios Motakis

A VFIO userspace driver will start by opening the VFIO device
that corresponds to an IOMMU group, and will use the ioctl interface
to get the basic device info, such as number of memory regions and
interrupts, and their properties. This patch enables the
VFIO_DEVICE_GET_INFO ioctl call.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_platform_common.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 07d02dc..ba5edfd 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data)
 static long vfio_platform_ioctl(void *device_data,
   unsigned int cmd, unsigned long arg)
 {
-   if (cmd == VFIO_DEVICE_GET_INFO)
-   return -EINVAL;
+   struct vfio_platform_device *vdev = device_data;
+   unsigned long minsz;
+
+   if (cmd == VFIO_DEVICE_GET_INFO) {
+   struct vfio_device_info info;
+
+   minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+   if (copy_from_user(info, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (info.argsz  minsz)
+   return -EINVAL;
+
+   info.flags = vdev-flags;
+   info.num_regions = 0;
+   info.num_irqs = 0;
+
+   return copy_to_user((void __user *)arg, info, minsz);
 
-   else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
+   } else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
return -EINVAL;
 
else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 15/26] vfio/platform: return info for device memory mapped IO regions

2014-09-23 Thread Antonios Motakis

This patch enables the IOCTLs VFIO_DEVICE_GET_REGION_INFO ioctl call,
which allows the user to learn about the available MMIO resources of
a device.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_platform_common.c  | 84 +--
 drivers/vfio/platform/vfio_platform_private.h | 19 ++
 2 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index ba5edfd..469bdcb 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -27,17 +27,73 @@
 
 #include vfio_platform_private.h
 
+static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
+{
+   int cnt = 0, i;
+
+   while (vdev-get_resource(vdev, cnt))
+   cnt++;
+
+   vdev-regions = kcalloc(cnt, sizeof(struct vfio_platform_region),
+   GFP_KERNEL);
+   if (!vdev-regions)
+   return -ENOMEM;
+
+   for (i = 0; i  cnt;  i++) {
+   struct resource *res =
+   vdev-get_resource(vdev, i);
+
+   if (!res)
+   goto err;
+
+   vdev-regions[i].addr = res-start;
+   vdev-regions[i].size = resource_size(res);
+   vdev-regions[i].flags = 0;
+   }
+
+   vdev-num_regions = cnt;
+
+   return 0;
+err:
+   kfree(vdev-regions);
+   return -EINVAL;
+}
+
+static void vfio_platform_regions_cleanup(struct vfio_platform_device *vdev)
+{
+   vdev-num_regions = 0;
+   kfree(vdev-regions);
+}
+
 static void vfio_platform_release(void *device_data)
 {
+   struct vfio_platform_device *vdev = device_data;
+
+   if (atomic_dec_and_test(vdev-refcnt))
+   vfio_platform_regions_cleanup(vdev);
+
module_put(THIS_MODULE);
 }
 
 static int vfio_platform_open(void *device_data)
 {
+   struct vfio_platform_device *vdev = device_data;
+   int ret;
+
if (!try_module_get(THIS_MODULE))
return -ENODEV;
 
+   if (atomic_inc_return(vdev-refcnt) == 1) {
+   ret = vfio_platform_regions_init(vdev);
+   if (ret)
+   goto err_reg;
+   }
+
return 0;
+
+err_reg:
+   module_put(THIS_MODULE);
+   return ret;
 }
 
 static long vfio_platform_ioctl(void *device_data,
@@ -58,18 +114,36 @@ static long vfio_platform_ioctl(void *device_data,
return -EINVAL;
 
info.flags = vdev-flags;
-   info.num_regions = 0;
+   info.num_regions = vdev-num_regions;
info.num_irqs = 0;
 
return copy_to_user((void __user *)arg, info, minsz);
 
-   } else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
-   return -EINVAL;
+   } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
+   struct vfio_region_info info;
+
+   minsz = offsetofend(struct vfio_region_info, offset);
+
+   if (copy_from_user(info, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (info.argsz  minsz)
+   return -EINVAL;
+
+   if (info.index = vdev-num_regions)
+   return -EINVAL;
+
+   /* map offset to the physical address  */
+   info.offset = VFIO_PLATFORM_INDEX_TO_OFFSET(info.index);
+   info.size = vdev-regions[info.index].size;
+   info.flags = vdev-regions[info.index].flags;
+
+   return copy_to_user((void __user *)arg, info, minsz);
 
-   else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
+   } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
return -EINVAL;
 
-   else if (cmd == VFIO_DEVICE_SET_IRQS)
+   } else if (cmd == VFIO_DEVICE_SET_IRQS)
return -EINVAL;
 
else if (cmd == VFIO_DEVICE_RESET)
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index ef76737..383164a 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -15,7 +15,26 @@
 #ifndef VFIO_PLATFORM_PRIVATE_H
 #define VFIO_PLATFORM_PRIVATE_H
 
+#define VFIO_PLATFORM_OFFSET_SHIFT   40
+#define VFIO_PLATFORM_OFFSET_MASK (((u64)(1)  VFIO_PLATFORM_OFFSET_SHIFT) - 
1)
+
+#define VFIO_PLATFORM_OFFSET_TO_INDEX(off) \
+   (off  VFIO_PLATFORM_OFFSET_SHIFT)
+
+#define VFIO_PLATFORM_INDEX_TO_OFFSET(index)   \
+   ((u64)(index)  VFIO_PLATFORM_OFFSET_SHIFT)
+
+struct vfio_platform_region {
+   u64 addr;
+   resource_size_t size;
+   u32 flags;
+};
+
 struct vfio_platform_device {
+   struct vfio_platform_region *regions;
+   u32 num_regions;
+   atomic_trefcnt;
+
/*

[PATCHv7 18/26] vfio/platform: return IRQ info

2014-09-23 Thread Antonios Motakis

Return information for the interrupts exposed by the device.
This patch extends VFIO_DEVICE_GET_INFO with the number of IRQs
and enables VFIO_DEVICE_GET_IRQ_INFO.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/Makefile|  2 +-
 drivers/vfio/platform/vfio_platform_common.c  | 30 --
 drivers/vfio/platform/vfio_platform_irq.c | 59 +++
 drivers/vfio/platform/vfio_platform_private.h | 10 +
 4 files changed, 97 insertions(+), 4 deletions(-)
 create mode 100644 drivers/vfio/platform/vfio_platform_irq.c

diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
index 1957170..81de144 100644
--- a/drivers/vfio/platform/Makefile
+++ b/drivers/vfio/platform/Makefile
@@ -1,5 +1,5 @@
 
-vfio-platform-y := vfio_platform.o vfio_platform_common.o
+vfio-platform-y := vfio_platform.o vfio_platform_common.o vfio_platform_irq.o
 
 obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
 
diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 5551d32..6dccf22 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -80,8 +80,10 @@ static void vfio_platform_release(void *device_data)
 {
struct vfio_platform_device *vdev = device_data;
 
-   if (atomic_dec_and_test(vdev-refcnt))
+   if (atomic_dec_and_test(vdev-refcnt)) {
vfio_platform_regions_cleanup(vdev);
+   vfio_platform_irq_cleanup(vdev);
+   }
 
module_put(THIS_MODULE);
 }
@@ -98,10 +100,16 @@ static int vfio_platform_open(void *device_data)
ret = vfio_platform_regions_init(vdev);
if (ret)
goto err_reg;
+
+   ret = vfio_platform_irq_init(vdev);
+   if (ret)
+   goto err_irq;
}
 
return 0;
 
+err_irq:
+   vfio_platform_regions_cleanup(vdev);
 err_reg:
module_put(THIS_MODULE);
return ret;
@@ -126,7 +134,7 @@ static long vfio_platform_ioctl(void *device_data,
 
info.flags = vdev-flags;
info.num_regions = vdev-num_regions;
-   info.num_irqs = 0;
+   info.num_irqs = vdev-num_irqs;
 
return copy_to_user((void __user *)arg, info, minsz);
 
@@ -152,7 +160,23 @@ static long vfio_platform_ioctl(void *device_data,
return copy_to_user((void __user *)arg, info, minsz);
 
} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
-   return -EINVAL;
+   struct vfio_irq_info info;
+
+   minsz = offsetofend(struct vfio_irq_info, count);
+
+   if (copy_from_user(info, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (info.argsz  minsz)
+   return -EINVAL;
+
+   if (info.index = vdev-num_irqs)
+   return -EINVAL;
+
+   info.flags = vdev-irqs[info.index].flags;
+   info.count = vdev-irqs[info.index].count;
+
+   return copy_to_user((void __user *)arg, info, minsz);
 
} else if (cmd == VFIO_DEVICE_SET_IRQS)
return -EINVAL;
diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
new file mode 100644
index 000..d99c71c
--- /dev/null
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -0,0 +1,59 @@
+/*
+ * VFIO platform devices interrupt handling
+ *
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis a.mota...@virtualopensystems.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include linux/device.h
+#include linux/eventfd.h
+#include linux/interrupt.h
+#include linux/iommu.h
+#include linux/module.h
+#include linux/mutex.h
+#include linux/notifier.h
+#include linux/pm_runtime.h
+#include linux/slab.h
+#include linux/types.h
+#include linux/uaccess.h
+#include linux/vfio.h
+#include linux/platform_device.h
+#include linux/irq.h
+
+#include vfio_platform_private.h
+
+int vfio_platform_irq_init(struct vfio_platform_device *vdev)
+{
+   int cnt = 0, i;
+
+   while (vdev-get_irq(vdev, cnt)  0)
+   cnt++;
+
+   vdev-irqs = kcalloc(cnt, sizeof(struct vfio_platform_irq), GFP_KERNEL);
+   if (!vdev-irqs)
+   return -ENOMEM;
+
+   for (i = 0; i  cnt; i++) {
+   vdev-irqs[i].flags = 0;
+   vdev-irqs[i].count = 1;
+   }
+
+   vdev-num_irqs = cnt;
+
+   return 0;

[PATCHv7 13/26] vfio: amba: add the VFIO for AMBA devices module to Kconfig

2014-09-23 Thread Antonios Motakis

Enable building the VFIO AMBA driver. VFIO_AMBA depends on VFIO_PLATFORM,
since it is sharing a portion of the code, and it is essentially implemented
as a platform device whose resources are discovered via AMBA specific APIs
in the kernel.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/Kconfig  | 10 ++
 drivers/vfio/platform/Makefile |  4 
 2 files changed, 14 insertions(+)

diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
index c51af17..8b97786 100644
--- a/drivers/vfio/platform/Kconfig
+++ b/drivers/vfio/platform/Kconfig
@@ -7,3 +7,13 @@ config VFIO_PLATFORM
  framework.
 
  If you don't know what to do here, say N.
+
+config VFIO_AMBA
+   tristate VFIO support for AMBA devices
+   depends on VFIO  VFIO_PLATFORM  EVENTFD  ARM_AMBA
+   help
+ Support for ARM AMBA devices with VFIO. This is required to make
+ use of ARM AMBA devices present on the system using the VFIO
+ framework.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
index 279862b..1957170 100644
--- a/drivers/vfio/platform/Makefile
+++ b/drivers/vfio/platform/Makefile
@@ -2,3 +2,7 @@
 vfio-platform-y := vfio_platform.o vfio_platform_common.o
 
 obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
+
+vfio-amba-y := vfio_amba.o
+
+obj-$(CONFIG_VFIO_AMBA) += vfio-amba.o
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 09/26] vfio/platform: initial skeleton of VFIO support for platform devices

2014-09-23 Thread Antonios Motakis

This patch forms the common skeleton code for platform devices support
with VFIO. This will include the core functionality of VFIO_PLATFORM,
however binding to the device and discovering the device resources will
be done with the help of a separate file where any Linux platform bus
specific code will reside.

This will allow us to implement support for also discovering AMBA devices
and their resources, but still reuse a large part of the VFIO_PLATFORM
implementation.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_platform_common.c  | 129 ++
 drivers/vfio/platform/vfio_platform_private.h |  35 +++
 2 files changed, 164 insertions(+)
 create mode 100644 drivers/vfio/platform/vfio_platform_common.c
 create mode 100644 drivers/vfio/platform/vfio_platform_private.h

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
new file mode 100644
index 000..07d02dc
--- /dev/null
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -0,0 +1,129 @@
+/*
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis a.mota...@virtualopensystems.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include linux/device.h
+#include linux/interrupt.h
+#include linux/iommu.h
+#include linux/module.h
+#include linux/mutex.h
+#include linux/notifier.h
+#include linux/pm_runtime.h
+#include linux/slab.h
+#include linux/types.h
+#include linux/uaccess.h
+#include linux/vfio.h
+#include linux/io.h
+
+#include vfio_platform_private.h
+
+static void vfio_platform_release(void *device_data)
+{
+   module_put(THIS_MODULE);
+}
+
+static int vfio_platform_open(void *device_data)
+{
+   if (!try_module_get(THIS_MODULE))
+   return -ENODEV;
+
+   return 0;
+}
+
+static long vfio_platform_ioctl(void *device_data,
+  unsigned int cmd, unsigned long arg)
+{
+   if (cmd == VFIO_DEVICE_GET_INFO)
+   return -EINVAL;
+
+   else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
+   return -EINVAL;
+
+   else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
+   return -EINVAL;
+
+   else if (cmd == VFIO_DEVICE_SET_IRQS)
+   return -EINVAL;
+
+   else if (cmd == VFIO_DEVICE_RESET)
+   return -EINVAL;
+
+   return -ENOTTY;
+}
+
+static ssize_t vfio_platform_read(void *device_data, char __user *buf,
+size_t count, loff_t *ppos)
+{
+   return -EINVAL;
+}
+
+static ssize_t vfio_platform_write(void *device_data, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+   return -EINVAL;
+}
+
+static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma)
+{
+   return -EINVAL;
+}
+
+static const struct vfio_device_ops vfio_platform_ops = {
+   .name   = vfio-platform,
+   .open   = vfio_platform_open,
+   .release= vfio_platform_release,
+   .ioctl  = vfio_platform_ioctl,
+   .read   = vfio_platform_read,
+   .write  = vfio_platform_write,
+   .mmap   = vfio_platform_mmap,
+};
+
+int vfio_platform_probe_common(struct vfio_platform_device *vdev,
+  struct device *dev)
+{
+   struct iommu_group *group;
+   int ret;
+
+   if (!vdev)
+   return -EINVAL;
+
+   group = iommu_group_get(dev);
+   if (!group) {
+   pr_err(VFIO: No IOMMU group for device %s\n, vdev-name);
+   return -EINVAL;
+   }
+
+   ret = vfio_add_group_dev(dev, vfio_platform_ops, vdev);
+   if (ret) {
+   iommu_group_put(group);
+   return ret;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_platform_probe_common);
+
+int vfio_platform_remove_common(struct device *dev)
+{
+   struct vfio_platform_device *vdev;
+
+   vdev = vfio_del_group_dev(dev);
+   if (!vdev)
+   return -EINVAL;
+
+   iommu_group_put(dev-iommu_group);
+   kfree(vdev);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_platform_remove_common);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
new file mode 100644
index 000..ef76737
--- /dev/null
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis a.mota...@virtualopensystems.com
+ *
+ * This program is free software; you can redistribute it

[PATCHv7 11/26] vfio: platform: add the VFIO PLATFORM module to Kconfig

2014-09-23 Thread Antonios Motakis

Enable building the VFIO PLATFORM driver that allows to use Linux platform
devices with VFIO.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/Kconfig   | 1 +
 drivers/vfio/Makefile  | 1 +
 drivers/vfio/platform/Kconfig  | 9 +
 drivers/vfio/platform/Makefile | 4 
 4 files changed, 15 insertions(+)
 create mode 100644 drivers/vfio/platform/Kconfig
 create mode 100644 drivers/vfio/platform/Makefile

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index a0abe04..962fb80 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -27,3 +27,4 @@ menuconfig VFIO
  If you don't know what to do here, say N.
 
 source drivers/vfio/pci/Kconfig
+source drivers/vfio/platform/Kconfig
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 0b035b1..dadf0ca 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
 obj-$(CONFIG_VFIO_PCI) += pci/
+obj-$(CONFIG_VFIO_PLATFORM) += platform/
diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
new file mode 100644
index 000..c51af17
--- /dev/null
+++ b/drivers/vfio/platform/Kconfig
@@ -0,0 +1,9 @@
+config VFIO_PLATFORM
+   tristate VFIO support for platform devices
+   depends on VFIO  EVENTFD  ARM
+   help
+ Support for platform devices with VFIO. This is required to make
+ use of platform devices present on the system using the VFIO
+ framework.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
new file mode 100644
index 000..279862b
--- /dev/null
+++ b/drivers/vfio/platform/Makefile
@@ -0,0 +1,4 @@
+
+vfio-platform-y := vfio_platform.o vfio_platform_common.o
+
+obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 12/26] vfio: amba: VFIO support for AMBA devices

2014-09-23 Thread Antonios Motakis

Add support for discovering AMBA devices with VFIO and handle them
similarly to Linux platform devices.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/platform/vfio_amba.c | 108 ++
 include/uapi/linux/vfio.h |   1 +
 2 files changed, 109 insertions(+)
 create mode 100644 drivers/vfio/platform/vfio_amba.c

diff --git a/drivers/vfio/platform/vfio_amba.c 
b/drivers/vfio/platform/vfio_amba.c
new file mode 100644
index 000..e8c5e1a
--- /dev/null
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis a.mota...@virtualopensystems.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include linux/device.h
+#include linux/interrupt.h
+#include linux/iommu.h
+#include linux/module.h
+#include linux/mutex.h
+#include linux/notifier.h
+#include linux/pm_runtime.h
+#include linux/slab.h
+#include linux/types.h
+#include linux/uaccess.h
+#include linux/vfio.h
+#include linux/io.h
+#include linux/irq.h
+#include linux/amba/bus.h
+
+#include vfio_platform_private.h
+
+#define DRIVER_VERSION  0.7
+#define DRIVER_AUTHOR   Antonios Motakis a.mota...@virtualopensystems.com
+#define DRIVER_DESC VFIO for AMBA devices - User Level meta-driver
+
+/* probing devices from the AMBA bus */
+
+static struct resource *get_amba_resource(struct vfio_platform_device *vdev,
+   int i)
+{
+   struct amba_device *adev = (struct amba_device *) vdev-opaque;
+
+   if (i == 0)
+   return adev-res;
+
+   return NULL;
+}
+
+static int get_amba_irq(struct vfio_platform_device *vdev, int i)
+{
+   struct amba_device *adev = (struct amba_device *) vdev-opaque;
+
+   if (i  AMBA_NR_IRQS)
+   return adev-irq[i];
+
+   return 0;
+}
+
+static int vfio_amba_probe(struct amba_device *adev, const struct amba_id *id)
+{
+
+   struct vfio_platform_device *vdev;
+   int ret;
+
+   vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
+   if (!vdev)
+   return -ENOMEM;
+
+   vdev-opaque = (void *) adev;
+   vdev-name = vfio-amba-dev;
+   vdev-flags = VFIO_DEVICE_FLAGS_AMBA;
+   vdev-get_resource = get_amba_resource;
+   vdev-get_irq = get_amba_irq;
+
+   ret = vfio_platform_probe_common(vdev, adev-dev);
+   if (ret)
+   kfree(vdev);
+
+   return ret;
+}
+
+static int vfio_amba_remove(struct amba_device *adev)
+{
+   return vfio_platform_remove_common(adev-dev);
+}
+
+static struct amba_id pl330_ids[] = {
+   { 0, 0 },
+};
+
+MODULE_DEVICE_TABLE(amba, pl330_ids);
+
+static struct amba_driver vfio_amba_driver = {
+   .probe = vfio_amba_probe,
+   .remove = vfio_amba_remove,
+   .id_table = pl330_ids,
+   .drv = {
+   .name = vfio-amba,
+   .owner = THIS_MODULE,
+   },
+};
+
+module_amba_driver(vfio_amba_driver);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE(GPL v2);
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index b022a25..72f121f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -159,6 +159,7 @@ struct vfio_device_info {
 #define VFIO_DEVICE_FLAGS_RESET(1  0)/* Device supports 
reset */
 #define VFIO_DEVICE_FLAGS_PCI  (1  1)/* vfio-pci device */
 #define VFIO_DEVICE_FLAGS_PLATFORM (1  2)/* vfio-platform device */
+#define VFIO_DEVICE_FLAGS_AMBA  (1  3)   /* vfio-amba device */
__u32   num_regions;/* Max region index + 1 */
__u32   num_irqs;   /* Max IRQ index + 1 */
 };
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 05/26] vfio: introduce the VFIO_DMA_MAP_FLAG_NOEXEC flag

2014-09-23 Thread Antonios Motakis

We introduce the VFIO_DMA_MAP_FLAG_NOEXEC flag to the VFIO dma map call,
and expose its availability via the capability VFIO_IOMMU_PROT_NOEXEC.
This way the user can control whether the XN flag will be set on the
requested mappings. The IOMMU_NOEXEC flag needs to be available for all
the IOMMUs of the container used.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 include/uapi/linux/vfio.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 6612974..30f630c 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -29,6 +29,7 @@
  * capability is subject to change as groups are added or removed.
  */
 #define VFIO_DMA_CC_IOMMU  4
+#define VFIO_IOMMU_PROT_NOEXEC 5
 
 /* Check if EEH is supported */
 #define VFIO_EEH   5
@@ -401,6 +402,7 @@ struct vfio_iommu_type1_dma_map {
__u32   flags;
 #define VFIO_DMA_MAP_FLAG_READ (1  0)/* readable from device 
*/
 #define VFIO_DMA_MAP_FLAG_WRITE (1  1)   /* writable from device */
+#define VFIO_DMA_MAP_FLAG_NOEXEC (1  2)  /* not executable from device */
__u64   vaddr;  /* Process virtual address */
__u64   iova;   /* IO virtual address */
__u64   size;   /* Size of mapping (bytes) */
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 08/26] driver core: amba: add documentation for binding path 'driver_override'

2014-09-23 Thread Antonios Motakis

Add documentation for alternative binding path 'driver_override' for
AMBA devices.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 Documentation/ABI/testing/sysfs-bus-amba | 20 
 1 file changed, 20 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-amba

diff --git a/Documentation/ABI/testing/sysfs-bus-amba 
b/Documentation/ABI/testing/sysfs-bus-amba
new file mode 100644
index 000..e7b5467
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-amba
@@ -0,0 +1,20 @@
+What:  /sys/bus/amba/devices/.../driver_override
+Date:  September 2014
+Contact:   Antonios Motakis a.mota...@virtualopensystems.com
+Description:
+   This file allows the driver for a device to be specified which
+   will override standard OF, ACPI, ID table, and name matching.
+   When specified, only a driver with a name matching the value
+   written to driver_override will have an opportunity to bind to
+   the device. The override is specified by writing a string to the
+   driver_override file (echo vfio-amba  driver_override) and may
+   be cleared with an empty string (echo  driver_override).
+   This returns the device to standard matching rules binding.
+   Writing to driver_override does not automatically unbind the
+   device from its current driver or make any attempt to
+   automatically load the specified driver. If no driver with a
+   matching name is currently loaded in the kernel, the device will
+   not bind to any driver. This also allows devices to opt-out of
+   driver binding using a driver_override name such as none.
+   Only a single driver may be specified in the override, there is
+   no support for parsing delimiters.
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 06/26] vfio/iommu_type1: implement the VFIO_DMA_MAP_FLAG_NOEXEC flag

2014-09-23 Thread Antonios Motakis

Some IOMMU drivers, such as the ARM SMMU driver, make available the
IOMMU_NOEXEC flag, to set the page tables for a device as XN (execute never).
This affects devices such as the ARM PL330 DMA Controller, which respects
this flag and will refuse to fetch DMA instructions from memory where the
XN flag has been set.

The flag can be used only if all IOMMU domains behind the container support
the IOMMU_NOEXEC flag. Also, if any mappings are created with the flag, any
new domains with devices will have to support it as well.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/vfio_iommu_type1.c | 38 +-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0734fbe..09e5064 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -81,6 +81,26 @@ struct vfio_group {
 };
 
 /*
+ * This function returns true only if _all_ domains support the capability.
+ */
+static int vfio_all_domains_have_iommu_noexec(struct vfio_iommu *iommu)
+{
+   struct vfio_domain *d;
+   int ret = 1;
+
+   mutex_lock(iommu-lock);
+   list_for_each_entry(d, iommu-domain_list, next) {
+   if (!iommu_domain_has_cap(d-domain, IOMMU_CAP_NOEXEC)) {
+   ret = 0;
+   break;
+   }
+   }
+   mutex_unlock(iommu-lock);
+
+   return ret;
+}
+
+/*
  * This code handles mapping and unmapping of user data buffers
  * into DMA'ble space using the IOMMU
  */
@@ -546,6 +566,11 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
prot |= IOMMU_WRITE;
if (map-flags  VFIO_DMA_MAP_FLAG_READ)
prot |= IOMMU_READ;
+   if (map-flags  VFIO_DMA_MAP_FLAG_NOEXEC) {
+   if (!vfio_all_domains_have_iommu_noexec(iommu))
+   return -EINVAL;
+   prot |= IOMMU_NOEXEC;
+   }
 
if (!prot || !size || (size | iova | vaddr)  mask)
return -EINVAL;
@@ -636,6 +661,12 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
dma = rb_entry(n, struct vfio_dma, node);
iova = dma-iova;
 
+   /* if any of the mappings to be replayed has the NOEXEC flag
+* set, then the new iommu domain must support it */
+   if ((dma-prot | IOMMU_NOEXEC) 
+   !iommu_domain_has_cap(domain-domain, IOMMU_CAP_NOEXEC))
+   return -EINVAL;
+
while (iova  dma-iova + dma-size) {
phys_addr_t phys = iommu_iova_to_phys(d-domain, iova);
size_t size;
@@ -890,6 +921,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
if (!iommu)
return 0;
return vfio_domains_have_iommu_cache(iommu);
+   case VFIO_IOMMU_PROT_NOEXEC:
+   if (!iommu)
+   return 0;
+   return vfio_all_domains_have_iommu_noexec(iommu);
default:
return 0;
}
@@ -913,7 +948,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
} else if (cmd == VFIO_IOMMU_MAP_DMA) {
struct vfio_iommu_type1_dma_map map;
uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-   VFIO_DMA_MAP_FLAG_WRITE;
+   VFIO_DMA_MAP_FLAG_WRITE |
+   VFIO_DMA_MAP_FLAG_NOEXEC;
 
minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 07/26] driver core: amba: add device binding path 'driver_override'

2014-09-23 Thread Antonios Motakis

As already demonstrated with PCI [1] and the platform bus [2], a
driver_override property in sysfs can be used to bypass the id matching
of a device to a AMBA driver. This can be used by VFIO to bind to any AMBA
device requested by the user.

[1] 
http://lists-archives.com/linux-kernel/28030441-pci-introduce-new-device-binding-path-using-pci_dev-driver_override.html
[2] https://www.redhat.com/archives/libvir-list/2014-April/msg00382.html

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/amba/bus.c   | 44 
 include/linux/amba/bus.h |  1 +
 2 files changed, 45 insertions(+)

diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 3cf61a1..473177c 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -17,6 +17,7 @@
 #include linux/pm_runtime.h
 #include linux/amba/bus.h
 #include linux/sizes.h
+#include linux/limits.h
 
 #include asm/irq.h
 
@@ -42,6 +43,10 @@ static int amba_match(struct device *dev, struct 
device_driver *drv)
struct amba_device *pcdev = to_amba_device(dev);
struct amba_driver *pcdrv = to_amba_driver(drv);
 
+   /* When driver_override is set, only bind to the matching driver */
+   if (pcdev-driver_override)
+   return !strcmp(pcdev-driver_override, drv-name);
+
return amba_lookup(pcdrv-id_table, pcdev) != NULL;
 }
 
@@ -58,6 +63,44 @@ static int amba_uevent(struct device *dev, struct 
kobj_uevent_env *env)
return retval;
 }
 
+static ssize_t driver_override_show(struct device *_dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct amba_device *dev = to_amba_device(_dev);
+
+   return sprintf(buf, %s\n, dev-driver_override);
+}
+
+static ssize_t driver_override_store(struct device *_dev,
+struct device_attribute *attr,
+const char *buf, size_t count)
+{
+   struct amba_device *dev = to_amba_device(_dev);
+   char *driver_override, *old = dev-driver_override, *cp;
+
+   if (count  PATH_MAX)
+   return -EINVAL;
+
+   driver_override = kstrndup(buf, count, GFP_KERNEL);
+   if (!driver_override)
+   return -ENOMEM;
+
+   cp = strchr(driver_override, '\n');
+   if (cp)
+   *cp = '\0';
+
+   if (strlen(driver_override)) {
+   dev-driver_override = driver_override;
+   } else {
+  kfree(driver_override);
+  dev-driver_override = NULL;
+   }
+
+   kfree(old);
+
+   return count;
+}
+
 #define amba_attr_func(name,fmt,arg...)
\
 static ssize_t name##_show(struct device *_dev,
\
   struct device_attribute *attr, char *buf)\
@@ -80,6 +123,7 @@ amba_attr_func(resource, \t%016llx\t%016llx\t%016lx\n,
 static struct device_attribute amba_dev_attrs[] = {
__ATTR_RO(id),
__ATTR_RO(resource),
+   __ATTR_RW(driver_override),
__ATTR_NULL,
 };
 
diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h
index fdd7e1b..7c011e7 100644
--- a/include/linux/amba/bus.h
+++ b/include/linux/amba/bus.h
@@ -32,6 +32,7 @@ struct amba_device {
struct clk  *pclk;
unsigned intperiphid;
unsigned intirq[AMBA_NR_IRQS];
+   char*driver_override;
 };
 
 struct amba_driver {
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 04/26] vfio/iommu_type1: support for platform bus devices on ARM

2014-09-23 Thread Antonios Motakis

This allows to make use of the VFIO_IOMMU_TYPE1 driver with platform
devices on ARM. The driver can then be used with an Exynos SMMU, or
ARM SMMU driver.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/vfio/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index d8c5763..a0abe04 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -16,7 +16,7 @@ config VFIO_SPAPR_EEH
 menuconfig VFIO
tristate VFIO Non-Privileged userspace driver framework
depends on IOMMU_API
-   select VFIO_IOMMU_TYPE1 if X86
+   select VFIO_IOMMU_TYPE1 if X86 || ARM
select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES)
select VFIO_SPAPR_EEH if (PPC_POWERNV || PPC_PSERIES)
select ANON_INODES
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 03/26] iommu/arm-smmu: add IOMMU_CAP_NOEXEC to the ARM SMMU driver

2014-09-23 Thread Antonios Motakis

The ARM SMMU supports the IOMMU_NOEXEC protection flag. Add the
corresponding IOMMU capability.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/iommu/arm-smmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c7cbdda..7c0fa25 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1539,6 +1539,8 @@ static int arm_smmu_domain_has_cap(struct iommu_domain 
*domain,
return features  ARM_SMMU_FEAT_COHERENT_WALK;
case IOMMU_CAP_INTR_REMAP:
return 1; /* MSIs are just memory writes */
+   case IOMMU_CAP_NOEXEC:
+   return 1;
default:
return 0;
}
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 02/26] iommu: add capability IOMMU_CAP_NOEXEC

2014-09-23 Thread Antonios Motakis

Some IOMMUs accept an IOMMU_NOEXEC protection flag in addition to
IOMMU_READ and IOMMU_WRITE. Expose this as an IOMMU capability.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e1a644c..0433553 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -59,6 +59,7 @@ struct iommu_domain {
 
 #define IOMMU_CAP_CACHE_COHERENCY  0x1
 #define IOMMU_CAP_INTR_REMAP   0x2 /* isolates device intrs */
+#define IOMMU_CAP_NOEXEC   0x3 /* IOMMU_NOEXEC flag */
 
 /*
  * Following constraints are specifc to FSL_PAMUV1:
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv7 01/26] iommu/arm-smmu: change IOMMU_EXEC to IOMMU_NOEXEC

2014-09-23 Thread Antonios Motakis

Exposing the XN flag of the SMMU driver as IOMMU_NOEXEC instead of
IOMMU_EXEC makes it enforceable, since for IOMMUs that don't support
the XN flag pages will always be executable.

Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
---
 drivers/iommu/arm-smmu.c | 9 +
 include/linux/iommu.h| 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a83cc2a..c7cbdda 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1252,7 +1252,7 @@ static int arm_smmu_alloc_init_pte(struct arm_smmu_device 
*smmu, pmd_t *pmd,
   unsigned long pfn, int prot, int stage)
 {
pte_t *pte, *start;
-   pteval_t pteval = ARM_SMMU_PTE_PAGE | ARM_SMMU_PTE_AF | ARM_SMMU_PTE_XN;
+   pteval_t pteval = ARM_SMMU_PTE_PAGE | ARM_SMMU_PTE_AF;
 
if (pmd_none(*pmd)) {
/* Allocate a new set of tables */
@@ -1286,10 +1286,11 @@ static int arm_smmu_alloc_init_pte(struct 
arm_smmu_device *smmu, pmd_t *pmd,
pteval |= ARM_SMMU_PTE_MEMATTR_NC;
}
 
+   if (prot  IOMMU_NOEXEC)
+   pteval |= ARM_SMMU_PTE_XN;
+
/* If no access, create a faulting entry to avoid TLB fills */
-   if (prot  IOMMU_EXEC)
-   pteval = ~ARM_SMMU_PTE_XN;
-   else if (!(prot  (IOMMU_READ | IOMMU_WRITE)))
+   if (!(prot  (IOMMU_READ | IOMMU_WRITE)))
pteval = ~ARM_SMMU_PTE_PAGE;
 
pteval |= ARM_SMMU_PTE_SH_IS;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 20f9a52..e1a644c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -27,7 +27,7 @@
 #define IOMMU_READ (1  0)
 #define IOMMU_WRITE(1  1)
 #define IOMMU_CACHE(1  2) /* DMA cache coherency */
-#define IOMMU_EXEC (1  3)
+#define IOMMU_NOEXEC   (1  3)
 
 struct iommu_ops;
 struct iommu_group;
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 3/3] ivshmem: add check on protocol version in QEMU

2014-09-23 Thread Stefan Hajnoczi

On Mon, Sep 08, 2014 at 12:49:48PM +0300, Michael S. Tsirkin wrote:
 On Mon, Sep 08, 2014 at 11:17:50AM +0200, David Marchand wrote:
  Send a protocol version as the first message from server, clients must close
  communication if they don't support this protocol version.
 
 What's the motivation here?
 This is at best a way to break all clients if an incompatible
 change in the server is made.
 Would not it be better to send a bitmap, or a list of supported
 versions, so it's possible to write servers compatible
 with multiple clients?

I'm not sure a full-fledged feature negotiation system is needed.  The
ivshmem protocol is local to the host and all participants are under
control of the administrator.

I suggested a protocol version to protect against misconfiguration.  For
example, building QEMU from source but talking to an outdated ivhsmem
server that is still running from before.

Remember that ivshmem-server and QEMU are shipped together by the
distro.  So in 99% of the cases they will have the same version anyway.
But we want to protect against rare misconfiguration that break things
(user mixing and matching incompatible software).

The only reason I can see for fancy negotiation is to make life easier
for proprietary third-party software, which I don't care about or like.

Stefan


pgpTsnhsOuII4.pgp
Description: PGP signature

Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Andres Lagar-Cavilla

On Tue, Sep 23, 2014 at 12:49 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 22/09/2014 23:54, Andres Lagar-Cavilla ha scritto:
 @@ -1406,32 +1406,24 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned 
 long *rmapp,
   struct rmap_iterator uninitialized_var(iter);
   int young = 0;

 - /*
 -  * In case of absence of EPT Access and Dirty Bits supports,
 -  * emulate the accessed bit for EPT, by checking if this page has
 -  * an EPT mapping, and clearing it if it does. On the next access,
 -  * a new EPT mapping will be established.
 -  * This has some overhead, but not as much as the cost of swapping
 -  * out actively used pages or breaking up actively used hugepages.
 -  */
 - if (!shadow_accessed_mask) {
 - young = kvm_unmap_rmapp(kvm, rmapp, slot, data);
 - goto out;
 - }
 + BUG_ON(!shadow_accessed_mask);

   for (sptep = rmap_get_first(*rmapp, iter); sptep;
sptep = rmap_get_next(iter)) {
 + struct kvm_mmu_page *sp;
 + gfn_t gfn;
   BUG_ON(!is_shadow_present_pte(*sptep));
 + /* From spte to gfn. */
 + sp = page_header(__pa(sptep));
 + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);

   if (*sptep  shadow_accessed_mask) {
   young = 1;
   clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
   }
 + trace_kvm_age_page(gfn, slot, young);

 Yesterday I couldn't think of a way to avoid the
 page_header/kvm_mmu_page_get_gfn on every iteration, but it's actually
 not hard.  Instead of passing hva as datum, you can pass (unsigned long)
 start.  Then you can add PAGE_SIZE to it at the end of every call to
 kvm_age_rmapp, and keep the old tracing logic.

I'm not sure. The addition is not always by PAGE_SIZE, since it
depends on the current level we are iterating at in the outer
kvm_handle_hva_range(). IOW, could be PMD_SIZE or even PUD_SIZE, and
is_large_pte() enough to tell?

This is probably worth a general fix, I can see all the callbacks
benefiting from knowing the gfn (passed down by
kvm_handle_hva_range()) without any additional computation, and adding
that to a tracing call if they don't already.

Even passing the level down to the callback would help by cutting down
to one arithmetic op (subtract rmapp from slot rmap base pointer for
that level)

Andres


 Paolo



-- 
Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Paolo Bonzini

Il 23/09/2014 19:04, Andres Lagar-Cavilla ha scritto:
 I'm not sure. The addition is not always by PAGE_SIZE, since it
 depends on the current level we are iterating at in the outer
 kvm_handle_hva_range(). IOW, could be PMD_SIZE or even PUD_SIZE, and
 is_large_pte() enough to tell?
 
 This is probably worth a general fix, I can see all the callbacks
 benefiting from knowing the gfn (passed down by
 kvm_handle_hva_range()) without any additional computation, and adding
 that to a tracing call if they don't already.
 
 Even passing the level down to the callback would help by cutting down
 to one arithmetic op (subtract rmapp from slot rmap base pointer for
 that level)

You're right.  Let's apply this patch and work on that as a follow-up.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] kvm: Fix powerpc compile slippage.

2014-09-23 Thread Andres Lagar-Cavilla

After kvm: Fix page ageing bugs

Signed-off-by: Andres Lagar-Cavilla andre...@google.com
---
 arch/powerpc/include/asm/kvm_ppc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index fb86a22..d4a92d7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -243,7 +243,7 @@ struct kvmppc_ops {
int (*unmap_hva)(struct kvm *kvm, unsigned long hva);
int (*unmap_hva_range)(struct kvm *kvm, unsigned long start,
   unsigned long end);
-   int (*age_hva)(struct kvm *kvm, unsigned long hva);
+   int (*age_hva)(struct kvm *kvm, unsigned long start, unsigned long end);
int (*test_age_hva)(struct kvm *kvm, unsigned long hva);
void (*set_spte_hva)(struct kvm *kvm, unsigned long hva, pte_t pte);
void (*mmu_destroy)(struct kvm_vcpu *vcpu);
-- 
2.1.0.rc2.206.gedb03e5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] kvm: Fix powerpc compile slippage.

2014-09-23 Thread Andres Lagar-Cavilla

On Tue, Sep 23, 2014 at 10:58 AM, Andres Lagar-Cavilla
andre...@google.com wrote:
 After kvm: Fix page ageing bugs

 Signed-off-by: Andres Lagar-Cavilla andre...@google.com

I can resend without the 1/2 (git n00b growing pains). Otherwise that
should fix the kbuild error. Apologies for that.
Andres

 ---
  arch/powerpc/include/asm/kvm_ppc.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
 b/arch/powerpc/include/asm/kvm_ppc.h
 index fb86a22..d4a92d7 100644
 --- a/arch/powerpc/include/asm/kvm_ppc.h
 +++ b/arch/powerpc/include/asm/kvm_ppc.h
 @@ -243,7 +243,7 @@ struct kvmppc_ops {
 int (*unmap_hva)(struct kvm *kvm, unsigned long hva);
 int (*unmap_hva_range)(struct kvm *kvm, unsigned long start,
unsigned long end);
 -   int (*age_hva)(struct kvm *kvm, unsigned long hva);
 +   int (*age_hva)(struct kvm *kvm, unsigned long start, unsigned long 
 end);
 int (*test_age_hva)(struct kvm *kvm, unsigned long hva);
 void (*set_spte_hva)(struct kvm *kvm, unsigned long hva, pte_t pte);
 void (*mmu_destroy)(struct kvm_vcpu *vcpu);
 --
 2.1.0.rc2.206.gedb03e5




-- 
Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] kvm: Fix powerpc compile slippage.

2014-09-23 Thread Paolo Bonzini

Il 23/09/2014 20:03, Andres Lagar-Cavilla ha scritto:
 I can resend without the 1/2 (git n00b growing pains). Otherwise that
 should fix the kbuild error. Apologies for that.

No problem, kvm/queue is there for this reason.  It can be rebased, so
I'll just squash this into the patch.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Using physical disks in a VM

2014-09-23 Thread Boylan, Ross

I have a couple of SATA disks (software RAID on some partitions) with a system 
that has become unreliable on them.  I plan to add more disks and install a new 
system on them (Debian wheezy).  I will still need to run things on the old OS 
(simultaneously running the new OS) to migrate.

Is there a way I can use the 2 old physical disks in a VM?  How?

Searching suggests that I can't simply say -hda /dev/sdc.  There are some 
references to using virtio, but it isn't clear to me if this would enable me to 
use the physical disk as is.

Thanks for any help.  cc's appreciated, though I'm subscribed with my old 
edress.

Ross Boylan--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM: remove Kconfig symbol KVM_VFIO?

2014-09-23 Thread Paul Bolle

Will,

Your commit 80ce1639727e (KVM: VFIO: register kvm_device_ops
dynamically) is included in linux-next since next-20140918. It removes
the last usage of CONFIG_KVM_VFIO. After that commit setting KVM_VFIO is
pointless.

Is the patch to remove the Kconfig symbol KVM_VFIO from the tree queued
somewhere? If not, should I submit a trivial patch that does that?


Paul Bolle

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm/x86/mmu: Pass gfn and level to rmapp callback.

2014-09-23 Thread Andres Lagar-Cavilla

Callbacks don't have to do extra computation to learn what the caller
(lvm_handle_hva_range()) knows very well. Useful for
debugging/tracing/printk/future.

Signed-off-by: Andres Lagar-Cavilla andre...@google.com
---
 arch/x86/kvm/mmu.c | 38 ++
 include/trace/events/kvm.h | 10 ++
 2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f33d5e4..cc14eba 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1262,7 +1262,8 @@ static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
 }
 
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
-  struct kvm_memory_slot *slot, unsigned long data)
+  struct kvm_memory_slot *slot, gfn_t gfn, int level,
+  unsigned long data)
 {
u64 *sptep;
struct rmap_iterator iter;
@@ -1270,7 +1271,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
 
while ((sptep = rmap_get_first(*rmapp, iter))) {
BUG_ON(!(*sptep  PT_PRESENT_MASK));
-   rmap_printk(kvm_rmap_unmap_hva: spte %p %llx\n, sptep, 
*sptep);
+   rmap_printk(kvm_rmap_unmap_hva: spte %p %llx gfn %llx (%d)\n,
+sptep, *sptep, gfn, level);
 
drop_spte(kvm, sptep);
need_tlb_flush = 1;
@@ -1280,7 +1282,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
 }
 
 static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
-struct kvm_memory_slot *slot, unsigned long data)
+struct kvm_memory_slot *slot, gfn_t gfn, int level,
+unsigned long data)
 {
u64 *sptep;
struct rmap_iterator iter;
@@ -1294,7 +1297,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned 
long *rmapp,
 
for (sptep = rmap_get_first(*rmapp, iter); sptep;) {
BUG_ON(!is_shadow_present_pte(*sptep));
-   rmap_printk(kvm_set_pte_rmapp: spte %p %llx\n, sptep, *sptep);
+   rmap_printk(kvm_set_pte_rmapp: spte %p %llx gfn %llx (%d)\n,
+sptep, *sptep, gfn, level);
 
need_flush = 1;
 
@@ -1328,6 +1332,8 @@ static int kvm_handle_hva_range(struct kvm *kvm,
int (*handler)(struct kvm *kvm,
   unsigned long *rmapp,
   struct kvm_memory_slot *slot,
+  gfn_t gfn,
+  int level,
   unsigned long data))
 {
int j;
@@ -1357,6 +1363,7 @@ static int kvm_handle_hva_range(struct kvm *kvm,
 j  PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) {
unsigned long idx, idx_end;
unsigned long *rmapp;
+   gfn_t gfn = gfn_start;
 
/*
 * {idx(page_j) | page_j intersects with
@@ -1367,8 +1374,10 @@ static int kvm_handle_hva_range(struct kvm *kvm,
 
rmapp = __gfn_to_rmap(gfn_start, j, memslot);
 
-   for (; idx = idx_end; ++idx)
-   ret |= handler(kvm, rmapp++, memslot, data);
+   for (; idx = idx_end;
+  ++idx, gfn += (1UL  KVM_HPAGE_GFN_SHIFT(j)))
+   ret |= handler(kvm, rmapp++, memslot,
+  gfn, j, data);
}
}
 
@@ -1379,6 +1388,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
  unsigned long data,
  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
 struct kvm_memory_slot *slot,
+gfn_t gfn, int level,
 unsigned long data))
 {
return kvm_handle_hva_range(kvm, hva, hva + 1, data, handler);
@@ -1400,7 +1410,8 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, 
pte_t pte)
 }
 
 static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
-struct kvm_memory_slot *slot, unsigned long data)
+struct kvm_memory_slot *slot, gfn_t gfn, int level,
+unsigned long data)
 {
u64 *sptep;
struct rmap_iterator uninitialized_var(iter);
@@ -1410,25 +1421,20 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
 
for (sptep = rmap_get_first(*rmapp, iter); sptep;
 sptep = rmap_get_next(iter)) {
-   struct kvm_mmu_page *sp;
-   gfn_t gfn;
BUG_ON(!is_shadow_present_pte(*sptep));
-   /* From

Re: [PATCH] kvm/x86/mmu: Pass gfn and level to rmapp callback.

2014-09-23 Thread Paolo Bonzini

Il 23/09/2014 21:34, Andres Lagar-Cavilla ha scritto:
 Callbacks don't have to do extra computation to learn what the caller
 (lvm_handle_hva_range()) knows very well. Useful for
 debugging/tracing/printk/future.
 
 Signed-off-by: Andres Lagar-Cavilla andre...@google.com
 ---
  arch/x86/kvm/mmu.c | 38 ++
  include/trace/events/kvm.h | 10 ++
  2 files changed, 28 insertions(+), 20 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index f33d5e4..cc14eba 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -1262,7 +1262,8 @@ static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
  }
  
  static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 -struct kvm_memory_slot *slot, unsigned long data)
 +struct kvm_memory_slot *slot, gfn_t gfn, int level,
 +unsigned long data)
  {
   u64 *sptep;
   struct rmap_iterator iter;
 @@ -1270,7 +1271,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned 
 long *rmapp,
  
   while ((sptep = rmap_get_first(*rmapp, iter))) {
   BUG_ON(!(*sptep  PT_PRESENT_MASK));
 - rmap_printk(kvm_rmap_unmap_hva: spte %p %llx\n, sptep, 
 *sptep);
 + rmap_printk(kvm_rmap_unmap_hva: spte %p %llx gfn %llx (%d)\n,
 +  sptep, *sptep, gfn, level);
  
   drop_spte(kvm, sptep);
   need_tlb_flush = 1;
 @@ -1280,7 +1282,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned 
 long *rmapp,
  }
  
  static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 -  struct kvm_memory_slot *slot, unsigned long data)
 +  struct kvm_memory_slot *slot, gfn_t gfn, int level,
 +  unsigned long data)
  {
   u64 *sptep;
   struct rmap_iterator iter;
 @@ -1294,7 +1297,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned 
 long *rmapp,
  
   for (sptep = rmap_get_first(*rmapp, iter); sptep;) {
   BUG_ON(!is_shadow_present_pte(*sptep));
 - rmap_printk(kvm_set_pte_rmapp: spte %p %llx\n, sptep, *sptep);
 + rmap_printk(kvm_set_pte_rmapp: spte %p %llx gfn %llx (%d)\n,
 +  sptep, *sptep, gfn, level);
  
   need_flush = 1;
  
 @@ -1328,6 +1332,8 @@ static int kvm_handle_hva_range(struct kvm *kvm,
   int (*handler)(struct kvm *kvm,
  unsigned long *rmapp,
  struct kvm_memory_slot *slot,
 +gfn_t gfn,
 +int level,
  unsigned long data))
  {
   int j;
 @@ -1357,6 +1363,7 @@ static int kvm_handle_hva_range(struct kvm *kvm,
j  PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) {
   unsigned long idx, idx_end;
   unsigned long *rmapp;
 + gfn_t gfn = gfn_start;
  
   /*
* {idx(page_j) | page_j intersects with
 @@ -1367,8 +1374,10 @@ static int kvm_handle_hva_range(struct kvm *kvm,
  
   rmapp = __gfn_to_rmap(gfn_start, j, memslot);
  
 - for (; idx = idx_end; ++idx)
 - ret |= handler(kvm, rmapp++, memslot, data);
 + for (; idx = idx_end;
 +++idx, gfn += (1UL  KVM_HPAGE_GFN_SHIFT(j)))
 + ret |= handler(kvm, rmapp++, memslot,
 +gfn, j, data);
   }
   }
  
 @@ -1379,6 +1388,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned 
 long hva,
 unsigned long data,
 int (*handler)(struct kvm *kvm, unsigned long *rmapp,
struct kvm_memory_slot *slot,
 +  gfn_t gfn, int level,
unsigned long data))
  {
   return kvm_handle_hva_range(kvm, hva, hva + 1, data, handler);
 @@ -1400,7 +1410,8 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long 
 hva, pte_t pte)
  }
  
  static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 -  struct kvm_memory_slot *slot, unsigned long data)
 +  struct kvm_memory_slot *slot, gfn_t gfn, int level,
 +  unsigned long data)
  {
   u64 *sptep;
   struct rmap_iterator uninitialized_var(iter);
 @@ -1410,25 +1421,20 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned 
 long *rmapp,
  
   for (sptep = rmap_get_first(*rmapp, iter); sptep;
sptep = rmap_get_next(iter)) {
 - struct kvm_mmu_page *sp;
 - gfn_t gfn;

Re: KVM: remove Kconfig symbol KVM_VFIO?

2014-09-23 Thread Paolo Bonzini

Il 23/09/2014 21:09, Paul Bolle ha scritto:
 Will,
 
 Your commit 80ce1639727e (KVM: VFIO: register kvm_device_ops
 dynamically) is included in linux-next since next-20140918. It removes
 the last usage of CONFIG_KVM_VFIO. After that commit setting KVM_VFIO is
 pointless.
 
 Is the patch to remove the Kconfig symbol KVM_VFIO from the tree queued
 somewhere? If not, should I submit a trivial patch that does that?

No, if you send that yourself it will be appreciated.  Thanks!

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 03/13] powerpc/spapr: vfio: Implement spapr_tce_iommu_ops

2014-09-23 Thread Alex Williamson

On Tue, 2014-09-23 at 13:00 +1000, Alexey Kardashevskiy wrote:
 Modern IBM POWERPC systems support multiple IOMMU tables per PE
 so we need a more reliable way (compared to container_of()) to get
 a PE pointer from the iommu_table struct pointer used in IOMMU functions.
 
 At the moment IOMMU group data points to an iommu_table struct. This
 introduces a spapr_tce_iommu_group struct which keeps an iommu_owner
 and a spapr_tce_iommu_ops struct. For IODA, iommu_owner is a pointer to
 the pnv_ioda_pe struct, for others it is still a pointer to
 the iommu_table struct. The ops structs correspond to the type which
 iommu_owner points to.
 
 This defines a get_table() callback which returns an iommu_table
 by its number.
 
 As the IOMMU group data pointer points to variable type instead of
 iommu_table, VFIO SPAPR TCE driver is updated to use the new type.
 This changes the tce_container struct to store iommu_group instead of
 iommu_table.
 
 So, it was:
 - iommu_table points to iommu_group via iommu_table::it_group;
 - iommu_group points to iommu_table via iommu_group_get_iommudata();
 
 now it is:
 - iommu_table points to iommu_group via iommu_table::it_group;
 - iommu_group points to spapr_tce_iommu_group via
 iommu_group_get_iommudata();
 - spapr_tce_iommu_group points to either (depending on .get_table()):
   - iommu_table;
   - pnv_ioda_pe;
 
 This uses pnv_ioda1_iommu_get_table for both IODA12 but IODA2 will
 have own pnv_ioda2_iommu_get_table soon and pnv_ioda1_iommu_get_table
 will only be used for IODA1.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/include/asm/iommu.h|   6 ++
  arch/powerpc/include/asm/tce.h  |  13 +++
  arch/powerpc/kernel/iommu.c |  35 ++-
  arch/powerpc/platforms/powernv/pci-ioda.c   |  31 +-
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |   1 +
  arch/powerpc/platforms/powernv/pci.c|   2 +-
  arch/powerpc/platforms/pseries/iommu.c  |  10 +-
  drivers/vfio/vfio_iommu_spapr_tce.c | 148 
 ++--
  8 files changed, 208 insertions(+), 38 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index 42632c7..84ee339 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -108,13 +108,19 @@ extern void iommu_free_table(struct iommu_table *tbl, 
 const char *node_name);
   */
  extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
   int nid);
 +
 +struct spapr_tce_iommu_ops;
  #ifdef CONFIG_IOMMU_API
  extern void iommu_register_group(struct iommu_table *tbl,
 +  void *iommu_owner,
 +  struct spapr_tce_iommu_ops *ops,
int pci_domain_number, unsigned long pe_num);
  extern int iommu_add_device(struct device *dev);
  extern void iommu_del_device(struct device *dev);
  #else
  static inline void iommu_register_group(struct iommu_table *tbl,
 + void *iommu_owner,
 + struct spapr_tce_iommu_ops *ops,
   int pci_domain_number,
   unsigned long pe_num)
  {
 diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
 index 743f36b..9f159eb 100644
 --- a/arch/powerpc/include/asm/tce.h
 +++ b/arch/powerpc/include/asm/tce.h
 @@ -50,5 +50,18 @@
  #define TCE_PCI_READ 0x1 /* read from PCI allowed */
  #define TCE_VB_WRITE 0x1 /* write from VB allowed */
  
 +struct spapr_tce_iommu_group;
 +
 +struct spapr_tce_iommu_ops {
 + struct iommu_table *(*get_table)(
 + struct spapr_tce_iommu_group *data,
 + int num);
 +};
 +
 +struct spapr_tce_iommu_group {
 + void *iommu_owner;
 + struct spapr_tce_iommu_ops *ops;
 +};
 +
  #endif /* __KERNEL__ */
  #endif /* _ASM_POWERPC_TCE_H */
 diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
 index b378f78..1c5dae7 100644
 --- a/arch/powerpc/kernel/iommu.c
 +++ b/arch/powerpc/kernel/iommu.c
 @@ -878,24 +878,53 @@ void iommu_free_coherent(struct iommu_table *tbl, 
 size_t size,
   */
  static void group_release(void *iommu_data)
  {
 - struct iommu_table *tbl = iommu_data;
 - tbl-it_group = NULL;
 + kfree(iommu_data);
  }
  
 +static struct iommu_table *spapr_tce_default_get_table(
 + struct spapr_tce_iommu_group *data, int num)
 +{
 + struct iommu_table *tbl = data-iommu_owner;
 +
 + switch (num) {
 + case 0:
 + if (tbl-it_size)
 + return tbl;
 + /* fallthru */
 + default:
 + return NULL;
 + }
 +}
 +
 +static struct spapr_tce_iommu_ops spapr_tce_default_ops = {
 + .get_table = spapr_tce_default_get_table
 +};
 +
  void iommu_register_group(struct

Re: [PATCH RFC] virtio-pci: share config interrupt between virtio devices

2014-09-23 Thread Stefan Fritsch

On Sunday 21 September 2014 13:21:06, Michael S. Tsirkin wrote:
 On Sun, Sep 21, 2014 at 11:36:44AM +0200, Stefan Fritsch wrote:
  On Sunday 21 September 2014 11:09:14, Michael S. Tsirkin wrote:
   On Thu, Sep 18, 2014 at 09:18:37PM +0200, Stefan Fritsch wrote:
On Monday 01 September 2014 09:37:30, Michael S. Tsirkin 
wrote:
 Why do we need INT#x?
 How about setting IRQF_SHARED for the config interrupt
 while using MSI-X? You'd have to read ISR to check that the
 interrupt was intended for your device.
   

   
The virtio 0.9.5 spec says that ISR is unused when in MSI-X
mode. I  don't think that you can depend on the device to set
the
configuration changed bit.
The virtio 1.0 spec seems to have fixed that.
  
   
  
   Yes, virtio 0.9.5 has this bug. But in practice qemu always set
   this bit, so for qemu we could do that
   unconditionally.  Pekka's lkvm tool doesn't
   unfortunately.  It's easy to fix that, but it would be nicer to
   additionally probe for old versions of the tool, and disable
   IRQF_SHARED in that case.
 
  
 
  What about other implementations? I think Linux should try to
  conform  to the spec so that all device implementations which
  conform to the spec just work.
 
  
 
  One implementation that comes to mind is virtualbox. But from a
  quick  look at the source, it seems that it sets the ISR bit
  always, too. And it uses qemu's subsystem vendor id.
 
  
 
  But there are other implementations. For example bhyve.
 
 I couldn't find any code in bhyve that sets VTCFG_ISR_CONF_CHANGED.
 Maybe it doesn't generate config changed interrupts?
 
 bhyve sets subsystem vendor to 0 apparently?
 We could use that to detect it.

My point was that there are many virtio implementations by now and you 
can't assume you know all of them.

 But maybe we should just make it a 1.0 only feature.

FWIW, I think that would be the better option.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 04/13] powerpc/powernv: Convert/move set_bypass() callback to take_ownership()

2014-09-23 Thread Alex Williamson

On Tue, 2014-09-23 at 13:00 +1000, Alexey Kardashevskiy wrote:
 At the moment the iommu_table struct has a set_bypass() which enables/
 disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code
 which calls this callback when external IOMMU users such as VFIO are
 about to get over a PHB.
 
 Since the set_bypass() is not really an iommu_table function but PE's
 function, and we have an ops struct per IOMMU owner, let's move
 set_bypass() to the spapr_tce_iommu_ops struct.
 
 As arch/powerpc/kernel/iommu.c is more about POWERPC IOMMU tables and
 has very little to do with PEs, this moves take_ownership() calls to
 the VFIO SPAPR TCE driver.
 
 This renames set_bypass() to take_ownership() as it is not necessarily
 just enabling bypassing, it can be something else/more so let's give it
 a generic name. The bool parameter is inverted.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Reviewed-by: Gavin Shan gws...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/iommu.h  |  1 -
  arch/powerpc/include/asm/tce.h|  2 ++
  arch/powerpc/kernel/iommu.c   | 12 
  arch/powerpc/platforms/powernv/pci-ioda.c | 20 
  drivers/vfio/vfio_iommu_spapr_tce.c   | 16 
  5 files changed, 30 insertions(+), 21 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index 84ee339..2b0b01d 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -77,7 +77,6 @@ struct iommu_table {
  #ifdef CONFIG_IOMMU_API
   struct iommu_group *it_group;
  #endif
 - void (*set_bypass)(struct iommu_table *tbl, bool enable);
  };
  
  /* Pure 2^n version of get_order */
 diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
 index 9f159eb..e6355f9 100644
 --- a/arch/powerpc/include/asm/tce.h
 +++ b/arch/powerpc/include/asm/tce.h
 @@ -56,6 +56,8 @@ struct spapr_tce_iommu_ops {
   struct iommu_table *(*get_table)(
   struct spapr_tce_iommu_group *data,
   int num);
 + void (*take_ownership)(struct spapr_tce_iommu_group *data,
 + bool enable);

set is a better verb when using a bool to specify direction, imho.

This is pretty confusing now that we have

iommu_take_ownership()
data-ops-take_ownership(true)

iommu_release_ownership()
data-ops-take_ownership(false)

And there's zero comments here about what take_ownership is supposed to
provide, or get_table for that matter.

  };
  
  struct spapr_tce_iommu_group {
 diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
 index 1c5dae7..c2c8d9d 100644
 --- a/arch/powerpc/kernel/iommu.c
 +++ b/arch/powerpc/kernel/iommu.c
 @@ -1139,14 +1139,6 @@ int iommu_take_ownership(struct iommu_table *tbl)
   memset(tbl-it_map, 0xff, sz);
   iommu_clear_tces_and_put_pages(tbl, tbl-it_offset, tbl-it_size);
  
 - /*
 -  * Disable iommu bypass, otherwise the user can DMA to all of
 -  * our physical memory via the bypass window instead of just
 -  * the pages that has been explicitly mapped into the iommu
 -  */
 - if (tbl-set_bypass)
 - tbl-set_bypass(tbl, false);
 -
   return 0;
  }
  EXPORT_SYMBOL_GPL(iommu_take_ownership);
 @@ -1161,10 +1153,6 @@ void iommu_release_ownership(struct iommu_table *tbl)
   /* Restore bit#0 set by iommu_init_table() */
   if (tbl-it_offset == 0)
   set_bit(0, tbl-it_map);
 -
 - /* The kernel owns the device now, we can restore the iommu bypass */
 - if (tbl-set_bypass)
 - tbl-set_bypass(tbl, true);
  }
  EXPORT_SYMBOL_GPL(iommu_release_ownership);
  
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index 2d32a1c..8cb2f31 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -1105,10 +1105,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
 *phb,
   __free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
  }
  
 -static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
 +static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
  {
 - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
 -   tce32.table);
   uint16_t window_id = (pe-pe_number  1 ) + 1;
   int64_t rc;
  
 @@ -1136,7 +1134,7 @@ static void pnv_pci_ioda2_set_bypass(struct iommu_table 
 *tbl, bool enable)
* host side.
*/
   if (pe-pdev)
 - set_iommu_table_base(pe-pdev-dev, tbl);
 + set_iommu_table_base(pe-pdev-dev, pe-tce32.table);
   else
   pnv_ioda_setup_bus_dma(pe, pe-pbus, false);
   }
 @@ -1152,15 +1150,21 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct 
 pnv_phb *phb,
   /* TVE #1 is selected

Re: [PATCH v2 13/13] vfio: powerpc/spapr: Enable Dynamic DMA windows

2014-09-23 Thread Alex Williamson

On Tue, 2014-09-23 at 13:01 +1000, Alexey Kardashevskiy wrote:
 This defines and implements VFIO IOMMU API which lets the userspace
 create and remove DMA windows.
 
 This updates VFIO_IOMMU_SPAPR_TCE_GET_INFO to return the number of
 available windows and page mask.
 
 This adds VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE
 to allow the user space to create and remove window(s).
 
 The VFIO IOMMU driver does basic sanity checks and calls corresponding
 SPAPR TCE functions. At the moment only IODA2 (POWER8 PCI host bridge)
 implements them.
 
 This advertises VFIO_IOMMU_SPAPR_TCE_FLAG_DDW capability via
 VFIO_IOMMU_SPAPR_TCE_GET_INFO.
 
 This calls platform DDW reset() callback when IOMMU is being disabled
 to reset the DMA configuration to its original state.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  drivers/vfio/vfio_iommu_spapr_tce.c | 135 
 ++--
  include/uapi/linux/vfio.h   |  25 ++-
  2 files changed, 153 insertions(+), 7 deletions(-)
 
 diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
 b/drivers/vfio/vfio_iommu_spapr_tce.c
 index 0dccbc4..b518891 100644
 --- a/drivers/vfio/vfio_iommu_spapr_tce.c
 +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
 @@ -190,18 +190,25 @@ static void tce_iommu_disable(struct tce_container 
 *container)
  
   container-enabled = false;
  
 - if (!container-grp || !current-mm)
 + if (!container-grp)
   return;
  
   data = iommu_group_get_iommudata(container-grp);
   if (!data || !data-iommu_owner || !data-ops-get_table)
   return;
  
 - tbl = data-ops-get_table(data, 0);
 - if (!tbl)
 - return;
 + if (current-mm) {
 + tbl = data-ops-get_table(data, 0);
 + if (tbl)
 + decrement_locked_vm(tbl);
  
 - decrement_locked_vm(tbl);
 + tbl = data-ops-get_table(data, 1);
 + if (tbl)
 + decrement_locked_vm(tbl);
 + }
 +
 + if (data-ops-reset)
 + data-ops-reset(data);
  }
  
  static void *tce_iommu_open(unsigned long arg)
 @@ -243,7 +250,7 @@ static long tce_iommu_ioctl(void *iommu_data,
unsigned int cmd, unsigned long arg)
  {
   struct tce_container *container = iommu_data;
 - unsigned long minsz;
 + unsigned long minsz, ddwsz;
   long ret;
  
   switch (cmd) {
 @@ -288,6 +295,28 @@ static long tce_iommu_ioctl(void *iommu_data,
   info.dma32_window_size = tbl-it_size  tbl-it_page_shift;
   info.flags = 0;
  
 + ddwsz = offsetofend(struct vfio_iommu_spapr_tce_info,
 + page_size_mask);
 +
 + if (info.argsz == ddwsz) {

=

 + if (data-ops-query  data-ops-create 
 + data-ops-remove) {
 + info.flags |= VFIO_IOMMU_SPAPR_TCE_FLAG_DDW;

I think you want to set this flag regardless of whether the user has
provided space for it.  A valid use model is to call with the minimum
size and look at the flags to determine if it needs to be called again
with a larger size.

 +
 + ret = data-ops-query(data,
 + info.current_windows,
 + info.windows_available,
 + info.page_size_mask);
 + if (ret)
 + return ret;
 + } else {
 + info.current_windows = 0;
 + info.windows_available = 0;
 + info.page_size_mask = 0;
 + }
 + minsz = ddwsz;

It's not really any longer the min size, is it?

 + }
 +
   if (copy_to_user((void __user *)arg, info, minsz))
   return -EFAULT;
  
 @@ -412,12 +441,106 @@ static long tce_iommu_ioctl(void *iommu_data,
   tce_iommu_disable(container);
   mutex_unlock(container-lock);
   return 0;
 +
   case VFIO_EEH_PE_OP:
   if (!container-grp)
   return -ENODEV;
  
   return vfio_spapr_iommu_eeh_ioctl(container-grp,
 cmd, arg);
 +
 + case VFIO_IOMMU_SPAPR_TCE_CREATE: {
 + struct vfio_iommu_spapr_tce_create create;
 + struct spapr_tce_iommu_group *data;
 + struct iommu_table *tbl;
 +
 + if (WARN_ON(!container-grp))

redux previous comment on this warning

 + return -ENXIO;
 +
 + data = iommu_group_get_iommudata(container-grp);
 +
 + minsz = offsetofend(struct vfio_iommu_spapr_tce_create,
 + start_addr);
 +
 + if (copy_from_user(create, (void __user *)arg, minsz))
 +

Re: [PATCHv7 01/26] iommu/arm-smmu: change IOMMU_EXEC to IOMMU_NOEXEC

2014-09-23 Thread Alex Williamson

On Tue, 2014-09-23 at 15:58 +0100, Will Deacon wrote:
 Hi Antonios,
 
 On Tue, Sep 23, 2014 at 03:46:00PM +0100, Antonios Motakis wrote:
  Exposing the XN flag of the SMMU driver as IOMMU_NOEXEC instead of
  IOMMU_EXEC makes it enforceable, since for IOMMUs that don't support
  the XN flag pages will always be executable.
  
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  ---
   drivers/iommu/arm-smmu.c | 9 +
   include/linux/iommu.h| 2 +-
   2 files changed, 6 insertions(+), 5 deletions(-)
 
 [...]
 
  diff --git a/include/linux/iommu.h b/include/linux/iommu.h
  index 20f9a52..e1a644c 100644
  --- a/include/linux/iommu.h
  +++ b/include/linux/iommu.h
  @@ -27,7 +27,7 @@
   #define IOMMU_READ (1  0)
   #define IOMMU_WRITE(1  1)
   #define IOMMU_CACHE(1  2) /* DMA cache coherency */
  -#define IOMMU_EXEC (1  3)
  +#define IOMMU_NOEXEC   (1  3)
 
 This hunk needs to be a separate patch merged by Joerg before I can take the
 arm-smmu part (which looks fine).

That separate hunk would be unbuildable since arm-smmu depends on the
IOMMU_EXEC define.  Patch 2/ is also in iommu code and gates patch 3/ in
arm-smmu.  The IOMMU-core changes are pretty trivial, so perhaps Joerg
would be willing to ACK 12 and let Will include the first 3 patches
through his tree.

These first 3 patches should have been sent on their own since they're
small an obvious so they don't get hung up on the reset of the series.
Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 05/26] vfio: introduce the VFIO_DMA_MAP_FLAG_NOEXEC flag

2014-09-23 Thread Alex Williamson

On Tue, 2014-09-23 at 16:46 +0200, Antonios Motakis wrote:
 We introduce the VFIO_DMA_MAP_FLAG_NOEXEC flag to the VFIO dma map call,
 and expose its availability via the capability VFIO_IOMMU_PROT_NOEXEC.
 This way the user can control whether the XN flag will be set on the
 requested mappings. The IOMMU_NOEXEC flag needs to be available for all
 the IOMMUs of the container used.
 
 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  include/uapi/linux/vfio.h | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
 index 6612974..30f630c 100644
 --- a/include/uapi/linux/vfio.h
 +++ b/include/uapi/linux/vfio.h
 @@ -29,6 +29,7 @@
   * capability is subject to change as groups are added or removed.
   */
  #define VFIO_DMA_CC_IOMMU4
 +#define VFIO_IOMMU_PROT_NOEXEC   5

Can't we advertise this as a flag bit in vfio_iommu_type1_info instead?
Also, EEH already took 5 as seen immediately below.

  
  /* Check if EEH is supported */
  #define VFIO_EEH 5
 @@ -401,6 +402,7 @@ struct vfio_iommu_type1_dma_map {
   __u32   flags;
  #define VFIO_DMA_MAP_FLAG_READ (1  0)  /* readable from device 
 */
  #define VFIO_DMA_MAP_FLAG_WRITE (1  1) /* writable from device */
 +#define VFIO_DMA_MAP_FLAG_NOEXEC (1  2)/* not executable from device */
   __u64   vaddr;  /* Process virtual address */
   __u64   iova;   /* IO virtual address */
   __u64   size;   /* Size of mapping (bytes) */



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 05/26] vfio: introduce the VFIO_DMA_MAP_FLAG_NOEXEC flag

2014-09-23 Thread Alex Williamson

On Tue, 2014-09-23 at 16:21 -0600, Alex Williamson wrote:
 On Tue, 2014-09-23 at 16:46 +0200, Antonios Motakis wrote:
  We introduce the VFIO_DMA_MAP_FLAG_NOEXEC flag to the VFIO dma map call,
  and expose its availability via the capability VFIO_IOMMU_PROT_NOEXEC.
  This way the user can control whether the XN flag will be set on the
  requested mappings. The IOMMU_NOEXEC flag needs to be available for all
  the IOMMUs of the container used.
  
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  ---
   include/uapi/linux/vfio.h | 2 ++
   1 file changed, 2 insertions(+)
  
  diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
  index 6612974..30f630c 100644
  --- a/include/uapi/linux/vfio.h
  +++ b/include/uapi/linux/vfio.h
  @@ -29,6 +29,7 @@
* capability is subject to change as groups are added or removed.
*/
   #define VFIO_DMA_CC_IOMMU  4
  +#define VFIO_IOMMU_PROT_NOEXEC 5
 
 Can't we advertise this as a flag bit in vfio_iommu_type1_info instead?

Ok, I see in the next patch that it's pretty similar to
VFIO_DMA_CC_IOMMU, so the check extension is probably correct for
determining the current state.  Maybe we could name it more similarly,
VFIO_DMA_NOEXEC_IOMMU.  I guess the intended usage is that once a user
attaches a group to the container they can query whether the
VFIO_DMA_MAP_FLAG_NOEXEC is valid.  Ok.  Thanks,

Alex

 Also, EEH already took 5 as seen immediately below.
 
   
   /* Check if EEH is supported */
   #define VFIO_EEH   5
  @@ -401,6 +402,7 @@ struct vfio_iommu_type1_dma_map {
  __u32   flags;
   #define VFIO_DMA_MAP_FLAG_READ (1  0)/* readable from device 
  */
   #define VFIO_DMA_MAP_FLAG_WRITE (1  1)   /* writable from device */
  +#define VFIO_DMA_MAP_FLAG_NOEXEC (1  2)  /* not executable from device */
  __u64   vaddr;  /* Process virtual address */
  __u64   iova;   /* IO virtual address */
  __u64   size;   /* Size of mapping (bytes) */
 
 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 06/26] vfio/iommu_type1: implement the VFIO_DMA_MAP_FLAG_NOEXEC flag

2014-09-23 Thread Alex Williamson

On Tue, 2014-09-23 at 16:46 +0200, Antonios Motakis wrote:
 Some IOMMU drivers, such as the ARM SMMU driver, make available the
 IOMMU_NOEXEC flag, to set the page tables for a device as XN (execute never).
 This affects devices such as the ARM PL330 DMA Controller, which respects
 this flag and will refuse to fetch DMA instructions from memory where the
 XN flag has been set.
 
 The flag can be used only if all IOMMU domains behind the container support
 the IOMMU_NOEXEC flag. Also, if any mappings are created with the flag, any
 new domains with devices will have to support it as well.
 
 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  drivers/vfio/vfio_iommu_type1.c | 38 +-
  1 file changed, 37 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
 index 0734fbe..09e5064 100644
 --- a/drivers/vfio/vfio_iommu_type1.c
 +++ b/drivers/vfio/vfio_iommu_type1.c
 @@ -81,6 +81,26 @@ struct vfio_group {
  };
  
  /*
 + * This function returns true only if _all_ domains support the capability.
 + */
 +static int vfio_all_domains_have_iommu_noexec(struct vfio_iommu *iommu)

Rename to vfio_domains_have_iommu_noexec() for consistency with the
cache version.

 +{
 + struct vfio_domain *d;
 + int ret = 1;
 +
 + mutex_lock(iommu-lock);
 + list_for_each_entry(d, iommu-domain_list, next) {
 + if (!iommu_domain_has_cap(d-domain, IOMMU_CAP_NOEXEC)) {

Should we cache this in domain-prot like we do for IOMMU_CACHE?

 + ret = 0;
 + break;
 + }
 + }
 + mutex_unlock(iommu-lock);
 +
 + return ret;
 +}
 +
 +/*
   * This code handles mapping and unmapping of user data buffers
   * into DMA'ble space using the IOMMU
   */
 @@ -546,6 +566,11 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
   prot |= IOMMU_WRITE;
   if (map-flags  VFIO_DMA_MAP_FLAG_READ)
   prot |= IOMMU_READ;
 + if (map-flags  VFIO_DMA_MAP_FLAG_NOEXEC) {
 + if (!vfio_all_domains_have_iommu_noexec(iommu))
 + return -EINVAL;
 + prot |= IOMMU_NOEXEC;
 + }
  
   if (!prot || !size || (size | iova | vaddr)  mask)
   return -EINVAL;
 @@ -636,6 +661,12 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
   dma = rb_entry(n, struct vfio_dma, node);
   iova = dma-iova;
  
 + /* if any of the mappings to be replayed has the NOEXEC flag
 +  * set, then the new iommu domain must support it */

nit, please fix the comment style to match the rest of the file.

 + if ((dma-prot | IOMMU_NOEXEC) 
 + !iommu_domain_has_cap(domain-domain, IOMMU_CAP_NOEXEC))
 + return -EINVAL;
 +
   while (iova  dma-iova + dma-size) {
   phys_addr_t phys = iommu_iova_to_phys(d-domain, iova);
   size_t size;
 @@ -890,6 +921,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
   if (!iommu)
   return 0;
   return vfio_domains_have_iommu_cache(iommu);
 + case VFIO_IOMMU_PROT_NOEXEC:
 + if (!iommu)
 + return 0;
 + return vfio_all_domains_have_iommu_noexec(iommu);
   default:
   return 0;
   }
 @@ -913,7 +948,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
   } else if (cmd == VFIO_IOMMU_MAP_DMA) {
   struct vfio_iommu_type1_dma_map map;
   uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
 - VFIO_DMA_MAP_FLAG_WRITE;
 + VFIO_DMA_MAP_FLAG_WRITE |
 + VFIO_DMA_MAP_FLAG_NOEXEC;
  
   minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
  



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 07/26] driver core: amba: add device binding path 'driver_override'

2014-09-23 Thread Alex Williamson

On Tue, 2014-09-23 at 16:46 +0200, Antonios Motakis wrote:
 As already demonstrated with PCI [1] and the platform bus [2], a
 driver_override property in sysfs can be used to bypass the id matching
 of a device to a AMBA driver. This can be used by VFIO to bind to any AMBA
 device requested by the user.
 
 [1] 
 http://lists-archives.com/linux-kernel/28030441-pci-introduce-new-device-binding-path-using-pci_dev-driver_override.html
 [2] https://www.redhat.com/archives/libvir-list/2014-April/msg00382.html
 
 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  drivers/amba/bus.c   | 44 
  include/linux/amba/bus.h |  1 +
  2 files changed, 45 insertions(+)

Why are 7  8 being dragged into this series?  Combine them into a
single patch and post it separately.

 
 diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
 index 3cf61a1..473177c 100644
 --- a/drivers/amba/bus.c
 +++ b/drivers/amba/bus.c
 @@ -17,6 +17,7 @@
  #include linux/pm_runtime.h
  #include linux/amba/bus.h
  #include linux/sizes.h
 +#include linux/limits.h
  
  #include asm/irq.h
  
 @@ -42,6 +43,10 @@ static int amba_match(struct device *dev, struct 
 device_driver *drv)
   struct amba_device *pcdev = to_amba_device(dev);
   struct amba_driver *pcdrv = to_amba_driver(drv);
  
 + /* When driver_override is set, only bind to the matching driver */
 + if (pcdev-driver_override)
 + return !strcmp(pcdev-driver_override, drv-name);
 +
   return amba_lookup(pcdrv-id_table, pcdev) != NULL;
  }
  
 @@ -58,6 +63,44 @@ static int amba_uevent(struct device *dev, struct 
 kobj_uevent_env *env)
   return retval;
  }
  
 +static ssize_t driver_override_show(struct device *_dev,
 + struct device_attribute *attr, char *buf)
 +{
 + struct amba_device *dev = to_amba_device(_dev);
 +
 + return sprintf(buf, %s\n, dev-driver_override);
 +}
 +
 +static ssize_t driver_override_store(struct device *_dev,
 +  struct device_attribute *attr,
 +  const char *buf, size_t count)
 +{
 + struct amba_device *dev = to_amba_device(_dev);
 + char *driver_override, *old = dev-driver_override, *cp;
 +
 + if (count  PATH_MAX)
 + return -EINVAL;
 +
 + driver_override = kstrndup(buf, count, GFP_KERNEL);
 + if (!driver_override)
 + return -ENOMEM;
 +
 + cp = strchr(driver_override, '\n');
 + if (cp)
 + *cp = '\0';
 +
 + if (strlen(driver_override)) {
 + dev-driver_override = driver_override;
 + } else {
 +kfree(driver_override);
 +dev-driver_override = NULL;
 + }
 +
 + kfree(old);
 +
 + return count;
 +}
 +
  #define amba_attr_func(name,fmt,arg...)  
 \
  static ssize_t name##_show(struct device *_dev,  
 \
  struct device_attribute *attr, char *buf)\
 @@ -80,6 +123,7 @@ amba_attr_func(resource, \t%016llx\t%016llx\t%016lx\n,
  static struct device_attribute amba_dev_attrs[] = {
   __ATTR_RO(id),
   __ATTR_RO(resource),
 + __ATTR_RW(driver_override),
   __ATTR_NULL,
  };
  
 diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h
 index fdd7e1b..7c011e7 100644
 --- a/include/linux/amba/bus.h
 +++ b/include/linux/amba/bus.h
 @@ -32,6 +32,7 @@ struct amba_device {
   struct clk  *pclk;
   unsigned intperiphid;
   unsigned intirq[AMBA_NR_IRQS];
 + char*driver_override;
  };
  
  struct amba_driver {



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 10/26] vfio: platform: probe to devices on the platform bus

2014-09-23 Thread Alex Williamson

On Tue, 2014-09-23 at 16:46 +0200, Antonios Motakis wrote:
 Driver to bind to Linux platform devices, and callbacks to discover their
 resources to be used by the main VFIO PLATFORM code.
 
 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  drivers/vfio/platform/vfio_platform.c | 96 
 +++
  include/uapi/linux/vfio.h |  1 +
  2 files changed, 97 insertions(+)
  create mode 100644 drivers/vfio/platform/vfio_platform.c
 
 diff --git a/drivers/vfio/platform/vfio_platform.c 
 b/drivers/vfio/platform/vfio_platform.c
 new file mode 100644
 index 000..024c026
 --- /dev/null
 +++ b/drivers/vfio/platform/vfio_platform.c
 @@ -0,0 +1,96 @@
 +/*
 + * Copyright (C) 2013 - Virtual Open Systems
 + * Author: Antonios Motakis a.mota...@virtualopensystems.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License, version 2, as
 + * published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + */
 +
 +#include linux/device.h
 +#include linux/eventfd.h
 +#include linux/interrupt.h
 +#include linux/iommu.h
 +#include linux/module.h
 +#include linux/mutex.h
 +#include linux/notifier.h
 +#include linux/pm_runtime.h
 +#include linux/slab.h
 +#include linux/types.h
 +#include linux/uaccess.h
 +#include linux/vfio.h
 +#include linux/io.h
 +#include linux/platform_device.h
 +#include linux/irq.h
 +
 +#include vfio_platform_private.h
 +
 +#define DRIVER_VERSION  0.7
 +#define DRIVER_AUTHOR   Antonios Motakis a.mota...@virtualopensystems.com
 +#define DRIVER_DESC VFIO for platform devices - User Level meta-driver
 +
 +/* probing devices from the linux platform bus */
 +
 +static struct resource *get_platform_resource(struct vfio_platform_device 
 *vdev,
 + int i)
 +{
 + struct platform_device *pdev = (struct platform_device *) vdev-opaque;
 +
 + return platform_get_resource(pdev, IORESOURCE_MEM, i);

ARM may only support IORESOURCE_MEM, but I don't think platform devices
are limited to MMIO, right?  vfio-platform shouldn't be either.

 +}
 +
 +static int get_platform_irq(struct vfio_platform_device *vdev, int i)
 +{
 + struct platform_device *pdev = (struct platform_device *) vdev-opaque;
 +
 + return platform_get_irq(pdev, i);
 +}
 +
 +
 +static int vfio_platform_probe(struct platform_device *pdev)
 +{
 + struct vfio_platform_device *vdev;
 + int ret;
 +
 + vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
 + if (!vdev)
 + return -ENOMEM;
 +
 + vdev-opaque = (void *) pdev;
 + vdev-name = pdev-name;
 + vdev-flags = VFIO_DEVICE_FLAGS_PLATFORM;
 + vdev-get_resource = get_platform_resource;
 + vdev-get_irq = get_platform_irq;
 +
 + ret = vfio_platform_probe_common(vdev, pdev-dev);
 + if (ret)
 + kfree(vdev);
 +
 + return ret;
 +}
 +
 +static int vfio_platform_remove(struct platform_device *pdev)
 +{
 + return vfio_platform_remove_common(pdev-dev);
 +}
 +
 +static struct platform_driver vfio_platform_driver = {
 + .probe  = vfio_platform_probe,
 + .remove = vfio_platform_remove,
 + .driver = {
 + .name   = vfio-platform,
 + .owner  = THIS_MODULE,
 + },
 +};
 +
 +module_platform_driver(vfio_platform_driver);
 +
 +MODULE_VERSION(DRIVER_VERSION);
 +MODULE_LICENSE(GPL v2);
 +MODULE_AUTHOR(DRIVER_AUTHOR);
 +MODULE_DESCRIPTION(DRIVER_DESC);
 diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
 index 30f630c..b022a25 100644
 --- a/include/uapi/linux/vfio.h
 +++ b/include/uapi/linux/vfio.h
 @@ -158,6 +158,7 @@ struct vfio_device_info {
   __u32   flags;
  #define VFIO_DEVICE_FLAGS_RESET  (1  0)/* Device supports 
 reset */
  #define VFIO_DEVICE_FLAGS_PCI(1  1)/* vfio-pci device */
 +#define VFIO_DEVICE_FLAGS_PLATFORM (1  2)  /* vfio-platform device */
   __u32   num_regions;/* Max region index + 1 */
   __u32   num_irqs;   /* Max IRQ index + 1 */
  };



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 13/26] vfio: amba: add the VFIO for AMBA devices module to Kconfig

2014-09-23 Thread Alex Williamson

On Tue, 2014-09-23 at 16:46 +0200, Antonios Motakis wrote:
 Enable building the VFIO AMBA driver. VFIO_AMBA depends on VFIO_PLATFORM,
 since it is sharing a portion of the code, and it is essentially implemented
 as a platform device whose resources are discovered via AMBA specific APIs
 in the kernel.
 
 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  drivers/vfio/platform/Kconfig  | 10 ++
  drivers/vfio/platform/Makefile |  4 
  2 files changed, 14 insertions(+)
 
 diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
 index c51af17..8b97786 100644
 --- a/drivers/vfio/platform/Kconfig
 +++ b/drivers/vfio/platform/Kconfig
 @@ -7,3 +7,13 @@ config VFIO_PLATFORM
 framework.
  
 If you don't know what to do here, say N.
 +
 +config VFIO_AMBA
 + tristate VFIO support for AMBA devices
 + depends on VFIO  VFIO_PLATFORM  EVENTFD  ARM_AMBA

nit, VFIO_PLATFORM already depends on VFIO  EVENTFD

 + help
 +   Support for ARM AMBA devices with VFIO. This is required to make
 +   use of ARM AMBA devices present on the system using the VFIO
 +   framework.
 +
 +   If you don't know what to do here, say N.
 diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
 index 279862b..1957170 100644
 --- a/drivers/vfio/platform/Makefile
 +++ b/drivers/vfio/platform/Makefile
 @@ -2,3 +2,7 @@
  vfio-platform-y := vfio_platform.o vfio_platform_common.o
  
  obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
 +
 +vfio-amba-y := vfio_amba.o
 +
 +obj-$(CONFIG_VFIO_AMBA) += vfio-amba.o



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] kvm, mem-hotplug: Add arch specific mmu notifier to handle apic access migration.

2014-09-23 Thread Tang Chen

Hi Paolo, 

I'm not sure if this patch is following your comment. Please review.
And all the other comments are followed. If this patch is OK, I'll 
send v8 soon.

Thanks.

We are handling L1 and L2 share one apic access page situation when migrating
apic access page. We should do some handling when migration happens in the
following situations:

   1) when L0 is running: Update L1's vmcs in the next L0-L1 entry and L2's
  vmcs in the next L1-L2 entry.

   2) when L1 is running: Force a L1-L0 exit, update L1's vmcs in the next
  L0-L1 entry and L2's vmcs in the next L1-L2 entry.

   3) when L2 is running: Force a L2-L0 exit, update L2's vmcs in the next
  L0-L2 entry and L1's vmcs in the next L2-L1 exit.

This patch force a L1-L0 exit or L2-L0 exit when shared apic access page is
migrated using mmu notifier. Since apic access page is only used on intel x86,
this is arch specific code.
---
 arch/arm/include/asm/kvm_host.h |  6 ++
 arch/arm64/include/asm/kvm_host.h   |  6 ++
 arch/ia64/include/asm/kvm_host.h|  8 
 arch/mips/include/asm/kvm_host.h|  7 +++
 arch/powerpc/include/asm/kvm_host.h |  6 ++
 arch/s390/include/asm/kvm_host.h|  9 +
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c  | 11 +++
 virt/kvm/kvm_main.c |  3 +++
 9 files changed, 58 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..79bbf7d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -182,6 +182,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e10c45a..ee89fad 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -192,6 +192,12 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
return 0;
 }
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index db95f57..326ac55 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -574,6 +574,14 @@ static inline struct kvm_pt_regs *vcpu_regs(struct 
kvm_vcpu *v)
return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
 }
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
+
 typedef int kvm_vmm_entry(void);
 typedef void kvm_tramp_entry(union context *host, union context *guest);
 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 7a3fc67..c392705 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -767,5 +767,12 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t 
*opc,
 extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
 
+#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+#endif /* KVM_ARCH_WANT_MMU_NOTIFIER */
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..c16a573 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -61,6 +61,12 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
+unsigned long address)
+{
+   return;
+}
+
 #define HPTEG_CACHE_NUM(1  15)
 #define HPTEG_HASH_BITS_PTE13
 #define HPTEG_HASH_BITS_PTE_LONG   12
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 773bef7..693290f 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -450,4 +450,13 @@ void kvm_arch_async_page_present(struct kvm_vcpu

Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Wanpeng Li

Hi Andres,
On Mon, Sep 22, 2014 at 02:54:42PM -0700, Andres Lagar-Cavilla wrote:
1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.

[...]
---
+  BUG_ON(!shadow_accessed_mask);
 
   for (sptep = rmap_get_first(*rmapp, iter); sptep;
sptep = rmap_get_next(iter)) {
+  struct kvm_mmu_page *sp;
+  gfn_t gfn;
   BUG_ON(!is_shadow_present_pte(*sptep));
+  /* From spte to gfn. */
+  sp = page_header(__pa(sptep));
+  gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);
 
   if (*sptep  shadow_accessed_mask) {
   young = 1;
   clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
   }
+  trace_kvm_age_page(gfn, slot, young);

IIUC, all the rmapps in this for loop are against the same gfn which
results in the above trace point dump the message duplicated.

Regards,
Wanpeng Li 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

66 matches

Mail list logo