Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
On 09/21/2015 09:36 AM, Linus Torvalds wrote: > > How many msr reads are so critical that the function call > overhead would matter? Get rid of the inline version of the _safe() > thing too, and put that thing there too. > Probably only the ones that may go in the context switch path. -hpa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
On Wed, Sep 30, 2015 at 7:01 AM, Ingo Molnarwrote: > > * Peter Zijlstra wrote: > >> On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote: >> > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar wrote: >> > > >> > > Linus, what's your preference? >> > >> > So quite frankly, is there any reason we don't just implement >> > native_read_msr() as just >> > >> >unsigned long long native_read_msr(unsigned int msr) >> >{ >> > int err; >> > unsigned long long val; >> > >> > val = native_read_msr_safe(msr, ); >> > WARN_ON_ONCE(err); >> > return val; >> >} >> > >> > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be >> > done with it. I don't see the downside. >> > >> > How many msr reads are so critical that the function call >> > overhead would matter? Get rid of the inline version of the _safe() >> > thing too, and put that thing there too. >> >> There are a few in the perf code, and esp. on cores without a stack engine >> the >> call overhead is noticeable. Also note that the perf MSRs are generally >> optimized MSRs and less slow (we cannot say fast, they're still MSRs) than >> regular MSRs. > > These could still be open coded in an inlined fashion, like the scheduler > usage. > We could have a raw_rdmsr for those. OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is worth the effort. This isn't a frequent source of bugs to my knowledge, and we don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do we really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures for code that doesn't use the _safe variants? --Andy > Thanks, > > Ingo -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 1/3] target-i386: add a subsection of vcpu's TSC rate in vmstate_x86_cpu
* Haozhong Zhang (haozhong.zh...@intel.com) wrote: > On Tue, Sep 29, 2015 at 08:00:13PM +0100, Dr. David Alan Gilbert wrote: > > * Haozhong Zhang (haozhong.zh...@intel.com) wrote: > > > The newly added subsection 'vmstate_tsc_khz' in this patch results in > > > vcpu's TSC rate being saved on the source machine and loaded on the > > > target machine during the migration. > > > > > > Signed-off-by: Haozhong Zhang> > > > Hi, > > I'd appreciate it if you could tie this to only do it on newer > > machine types; that way it won't break back migration. > > > > Does "back migration" mean migrating from QEMU w/ vmstate_tsc_khz > subsection to QEMU w/o that subsection? Yes; like if we migrate from a newer qemu to an older qemu but with the same machine type. Dave > > - Haozhong > > > Dave > > > > > --- > > > target-i386/machine.c | 20 > > > 1 file changed, 20 insertions(+) > > > > > > diff --git a/target-i386/machine.c b/target-i386/machine.c > > > index 9fa0563..80108a3 100644 > > > --- a/target-i386/machine.c > > > +++ b/target-i386/machine.c > > > @@ -752,6 +752,25 @@ static const VMStateDescription vmstate_xss = { > > > } > > > }; > > > > > > +static bool tsc_khz_needed(void *opaque) > > > +{ > > > +X86CPU *cpu = opaque; > > > +CPUX86State *env = >env; > > > + > > > +return env->tsc_khz != 0; > > > +} > > > + > > > +static const VMStateDescription vmstate_tsc_khz = { > > > +.name = "cpu/tsc_khz", > > > +.version_id = 1, > > > +.minimum_version_id = 1, > > > +.needed = tsc_khz_needed, > > > +.fields = (VMStateField[]) { > > > +VMSTATE_INT64(env.tsc_khz, X86CPU), > > > +VMSTATE_END_OF_LIST() > > > +} > > > +}; > > > + > > > VMStateDescription vmstate_x86_cpu = { > > > .name = "cpu", > > > .version_id = 12, > > > @@ -871,6 +890,7 @@ VMStateDescription vmstate_x86_cpu = { > > > _msr_hyperv_crash, > > > _avx512, > > > _xss, > > > +_tsc_khz, > > > NULL > > > } > > > }; > > > -- > > > 2.4.8 > > > > > > > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] kvm-all: notice KVM of vcpu's TSC rate after migration
On Wed, Sep 30, 2015 at 08:32:26AM +0800, Haozhong Zhang wrote: > > [...] > > > > Or maybe we shouldn't treat this as VM state, but as configuration, and > > > > let management configure the TSC frequency explicitly if the user really > > > > needs it to stay the same during migration. > > > > > > > > (CCing libvir-list to see if they have feedback) > > > > > > > > > > Thanks for CC. I'll consider to add a command line option to control > > > the configuration of guest TSC fequency. > > > > It already exists, -cpu has a "tsc-freq" option. > > > > What I'm considering is to add a "-keep-tsc-freq" option to control > whether the TSC frequency should be migrated if "tsc-freq" is not > presented. Compared to "tsc-freq", "-keep-tsc-freq" can free users > from figuring out the guest TSC frequency manually in the migration. If you do that, please make it a property of the CPU object, so it will can be set as a "-cpu" option. -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages
For MSI interrupts to work for a pass-through devices we need to have mapping of msi-pages in iommu. Now on some platforms (like x86) does this msi-pages mapping happens magically and in these case they chooses an iova which they somehow know that it will never overlap with guest memory. But this magic iova selection may not be always true for all platform (like PowerPC and ARM64). Also on x86 platform, there is no problem as long as running a x86-guest on x86-host but there can be issues when running a non-x86 guest on x86 host or other userspace applications like (I think ODP/DPDK). As in these cases there can be chances that it overlaps with guest memory mapping. This patch add interface to iommu-map and iommu-unmap msi-pages at reserved iova chosen by userspace. Signed-off-by: Bharat Bhushan--- drivers/vfio/vfio.c | 52 +++ drivers/vfio/vfio_iommu_type1.c | 111 include/linux/vfio.h| 9 +++- 3 files changed, 171 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 2fb29df..a817d2d 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb, return NOTIFY_OK; } +int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr, + uint32_t size, uint64_t *msi_iova) +{ + struct vfio_container *container = device->group->container; + struct vfio_iommu_driver *driver; + int ret; + + /* Validate address and size */ + if (!msi_addr || !size || !msi_iova) + return -EINVAL; + + down_read(>group_lock); + + driver = container->iommu_driver; + if (!driver || !driver->ops || !driver->ops->msi_map) { + up_read(>group_lock); + return -EINVAL; + } + + ret = driver->ops->msi_map(container->iommu_data, + msi_addr, size, msi_iova); + + up_read(>group_lock); + return ret; +} + +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t msi_iova, + uint64_t size) +{ + struct vfio_container *container = device->group->container; + struct vfio_iommu_driver *driver; + int ret; + + /* Validate address and size */ + if (!msi_iova || !size) + return -EINVAL; + + down_read(>group_lock); + + driver = container->iommu_driver; + if (!driver || !driver->ops || !driver->ops->msi_unmap) { + up_read(>group_lock); + return -EINVAL; + } + + ret = driver->ops->msi_unmap(container->iommu_data, +msi_iova, size); + + up_read(>group_lock); + return ret; +} + /** * VFIO driver API */ diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 3315fb6..ab376c2 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1003,12 +1003,34 @@ out_free: return ret; } +static void vfio_iommu_unmap_all_reserved_regions(struct vfio_iommu *iommu) +{ + struct vfio_resvd_region *region; + struct vfio_domain *d; + + list_for_each_entry(region, >reserved_iova_list, next) { + list_for_each_entry(d, >domain_list, next) { + if (!region->map_paddr) + continue; + + if (!iommu_iova_to_phys(d->domain, region->iova)) + continue; + + iommu_unmap(d->domain, region->iova, PAGE_SIZE); + region->map_paddr = 0; + cond_resched(); + } + } +} + static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu) { struct rb_node *node; while ((node = rb_first(>dma_list))) vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node)); + + vfio_iommu_unmap_all_reserved_regions(iommu); } static void vfio_iommu_type1_detach_group(void *iommu_data, @@ -1048,6 +1070,93 @@ done: mutex_unlock(>lock); } +static int vfio_iommu_type1_msi_map(void *iommu_data, uint64_t msi_addr, + uint64_t size, uint64_t *msi_iova) +{ + struct vfio_iommu *iommu = iommu_data; + struct vfio_resvd_region *region; + int ret; + + mutex_lock(>lock); + + /* Do not try ceate iommu-mapping if msi reconfig not allowed */ + if (!iommu->allow_msi_reconfig) { + mutex_unlock(>lock); + return 0; + } + + /* Check if there is already region mapping the msi page */ + list_for_each_entry(region, >reserved_iova_list, next) { + if (region->map_paddr == msi_addr) { + *msi_iova = region->iova; + region->refcount++; +
[RFC PATCH 0/6] vfio: Add interface to map MSI pages
This patch series add the interface to map MSI pages in iommu for msi-capable device pass-through using vfio. First patch adds a generic interface to set reserved iova regions. These reserved regions can be used for mapping physical address. Follow-up patches uses these reserved iova for mapping msi-pages. This patch series does not provide interface to let user-space know how many minimum reserved iova regions are required on a given platform. This interface can be added once this patches series get reviewed and will be in acceptable state. Bharat Bhushan (6): vfio: Add interface for add/del reserved iova region iommu: Add interface to get msi-pages mapping attributes vfio: Extend iommu-info to return MSIs automap state vfio: Add interface to iommu-map/unmap MSI pages vfio-pci: Create iommu mapping for msi interrupt arm-smmu: Allow to set iommu mapping for MSI drivers/iommu/arm-smmu.c | 8 + drivers/iommu/fsl_pamu_domain.c | 3 + drivers/iommu/iommu.c | 14 ++ drivers/vfio/pci/vfio_pci_intrs.c | 36 +++- drivers/vfio/vfio.c | 52 ++ drivers/vfio/vfio_iommu_type1.c | 344 +- include/linux/iommu.h | 9 +- include/linux/vfio.h | 9 +- include/uapi/linux/vfio.h | 46 + 9 files changed, 516 insertions(+), 5 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC 0/2] VFIO: Add virtual MSI doorbell support.
> -Original Message- > From: Christoffer Dall [mailto:christoffer.d...@linaro.org] > Sent: Friday, September 25, 2015 10:42 PM > To: Marc Zyngier> Cc: Bhushan Bharat-R65777 ; Pranavkumar > Sawargaonkar ; kvm@vger.kernel.org; Alex > Williamson ; kvm...@lists.cs.columbia.edu; > linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; Will > Deacon ; bhelg...@google.com; a...@arndb.de; > rob.herr...@linaro.org; eric.au...@linaro.org; patc...@apm.com; Yoder > Stuart-B08248 > Subject: Re: [RFC 0/2] VFIO: Add virtual MSI doorbell support. > > On Tue, Sep 22, 2015 at 11:09:14PM +0100, Marc Zyngier wrote: > > On Tue, 4 Aug 2015 06:52:01 +0100 > > Bhushan Bharat wrote: > > > > > > > > > > > > -Original Message- > > > > From: Pranavkumar Sawargaonkar [mailto:pranavku...@linaro.org] > > > > Sent: Tuesday, August 04, 2015 11:18 AM > > > > To: Bhushan Bharat-R65777 > > > > Cc: kvm@vger.kernel.org; Alex Williamson; > > > > kvm...@lists.cs.columbia.edu; > > > > linux-arm-ker...@lists.infradead.org; > > > > linux-ker...@vger.kernel.org; christoffer.d...@linaro.org; > > > > marc.zyng...@arm.com; will.dea...@arm.com; bhelg...@google.com; > > > > a...@arndb.de; rob.herr...@linaro.org; eric.au...@linaro.org; > > > > patc...@apm.com; Yoder Stuart-B08248 > > > > Subject: Re: [RFC 0/2] VFIO: Add virtual MSI doorbell support. > > > > > > > > Hi Bharat, > > > > > > > > On 28 July 2015 at 23:28, Alex Williamson > > > > > > > > wrote: > > > > > On Tue, 2015-07-28 at 17:23 +, Bhushan Bharat wrote: > > > > >> Hi Alex, > > > > >> > > > > >> > -Original Message- > > > > >> > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > >> > Sent: Tuesday, July 28, 2015 9:52 PM > > > > >> > To: Pranavkumar Sawargaonkar > > > > >> > Cc: kvm@vger.kernel.org; kvm...@lists.cs.columbia.edu; > > > > >> > linux-arm- ker...@lists.infradead.org; > > > > >> > linux-ker...@vger.kernel.org; christoffer.d...@linaro.org; > > > > >> > marc.zyng...@arm.com; will.dea...@arm.com; > > > > >> > bhelg...@google.com; a...@arndb.de; rob.herr...@linaro.org; > > > > >> > eric.au...@linaro.org; patc...@apm.com; Bhushan > > > > >> > Bharat-R65777; Yoder > > > > >> > Stuart-B08248 > > > > >> > Subject: Re: [RFC 0/2] VFIO: Add virtual MSI doorbell support. > > > > >> > > > > > >> > On Fri, 2015-07-24 at 14:33 +0530, Pranavkumar Sawargaonkar > wrote: > > > > >> > > In current VFIO MSI/MSI-X implementation, linux host kernel > > > > >> > > allocates MSI/MSI-X vectors when userspace requests through > > > > >> > > vfio > > > > ioctls. > > > > >> > > Vfio creates irqfd mappings to notify MSI/MSI-X interrupts > > > > >> > > to the userspace when raised. > > > > >> > > Guest OS will see emulated MSI/MSI-X controller and > > > > >> > > receives an interrupt when kernel notifies the same via irqfd. > > > > >> > > > > > > >> > > Host kernel allocates MSI/MSI-X using standard linux > > > > >> > > routines like > > > > >> > > pci_enable_msix_range() and pci_enable_msi_range(). > > > > >> > > These routines along with requset_irq() in host kernel sets > > > > >> > > up MSI/MSI-X vectors with Physical MSI/MSI-X addresses > > > > >> > > provided by interrupt controller driver in host kernel. > > > > >> > > > > > > >> > > This means when a device is assigned with the guest OS, > > > > >> > > MSI/MSI-X addresses present in PCIe EP are the PAs > > > > >> > > programmed by the host linux > > > > >> > kernel. > > > > >> > > > > > > >> > > In x86 MSI/MSI-X physical address range is reserved and > > > > >> > > iommu is aware about these addreses and transalation is > > > > >> > > bypassed for these > > > > address range. > > > > >> > > > > > > >> > > Unlike x86, ARM/ARM64 does not reserve MSI/MSI-X Physical > > > > >> > > address range and all the transactions including MSI go > > > > >> > > through iommu/smmu > > > > >> > without bypass. > > > > >> > > This requires extending current vfio MSI layer with > > > > >> > > additional functionality for ARM/ARM64 by 1. Programing > > > > >> > > IOVA (referred as a MSI virtual doorbell address) > > > > >> > >in device's MSI vector as a MSI address. > > > > >> > >This IOVA will be provided by the userspace based on the > > > > >> > >MSI/MSI-X addresses reserved for the guest. > > > > >> > > 2. Create an IOMMU mapping between this IOVA and > > > > >> > >Physical address (PA) assigned to the MSI vector. > > > > >> > > > > > > >> > > This RFC is proposing a solution for MSI/MSI-X passthrough > > > > >> > > for > > > > >> > ARM/ARM64. > > > > >> > > > > > >> > > > > > >> > Hi Pranavkumar, > > > > >> > > > > > >> > Freescale has the same, or very similar, need, so any > > > > >> > solution in this space will need to work for both ARM and > > > > >> > powerpc. I'm not a big
[RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
This patch allows the user-space to know whether msi-pages are automatically mapped with some magic iova or not. Even if the msi-pages are automatically mapped, still user-space wants to over-ride the automatic iova selection for msi-mapping. For this user-space need to know whether it is allowed to change the automatic mapping or not and this API provides this mechanism. Follow up patches will provide how to over-ride this. Signed-off-by: Bharat Bhushan--- drivers/vfio/vfio_iommu_type1.c | 32 include/uapi/linux/vfio.h | 3 +++ 2 files changed, 35 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index fa5d3e4..3315fb6 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -59,6 +59,7 @@ struct vfio_iommu { struct rb_root dma_list; boolv2; boolnesting; + boolallow_msi_reconfig; struct list_headreserved_iova_list; }; @@ -1117,6 +1118,23 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu) return ret; } +static +int vfio_domains_get_msi_maps(struct vfio_iommu *iommu, + struct iommu_domain_msi_maps *msi_maps) +{ + struct vfio_domain *d; + int ret; + + mutex_lock(>lock); + /* All domains have same msi-automap property, pick first */ + d = list_first_entry(>domain_list, struct vfio_domain, next); + ret = iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_MAPPING, + msi_maps); + mutex_unlock(>lock); + + return ret; +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -1138,6 +1156,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, } } else if (cmd == VFIO_IOMMU_GET_INFO) { struct vfio_iommu_type1_info info; + struct iommu_domain_msi_maps msi_maps; + int ret; minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes); @@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, info.flags = 0; + ret = vfio_domains_get_msi_maps(iommu, _maps); + if (ret) + return ret; + + if (msi_maps.override_automap) { + info.flags |= VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG; + iommu->allow_msi_reconfig = true; + } + + if (msi_maps.automap) + info.flags |= VFIO_IOMMU_INFO_MSI_AUTOMAP; + info.iova_pgsizes = vfio_pgsize_bitmap(iommu); return copy_to_user((void __user *)arg, , minsz); diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 1abd1a9..9998f6e 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -391,6 +391,9 @@ struct vfio_iommu_type1_info { __u32 argsz; __u32 flags; #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ +#define VFIO_IOMMU_INFO_MSI_AUTOMAP (1 << 1) /* MSI pages are auto-mapped + in iommu */ +#define VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG (1 << 2) /* Allows reconfig automap*/ __u64 iova_pgsizes; /* Bitmap of supported page sizes */ }; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region
This Patch adds the VFIO APIs to add and remove reserved iova regions. The reserved iova region can be used for mapping some specific physical address in iommu. Currently we are planning to use this interface for adding iova regions for creating iommu of msi-pages. But the API are designed for future extension where some other physical address can be mapped. Signed-off-by: Bharat Bhushan--- drivers/vfio/vfio_iommu_type1.c | 201 +++- include/uapi/linux/vfio.h | 43 + 2 files changed, 243 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 57d8c37..fa5d3e4 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -59,6 +59,7 @@ struct vfio_iommu { struct rb_root dma_list; boolv2; boolnesting; + struct list_headreserved_iova_list; }; struct vfio_domain { @@ -77,6 +78,15 @@ struct vfio_dma { int prot; /* IOMMU_READ/WRITE */ }; +struct vfio_resvd_region { + dma_addr_t iova; + size_t size; + int prot; /* IOMMU_READ/WRITE */ + int refcount; /* ref count of mappings */ + uint64_tmap_paddr; /* Mapped Physical Address */ + struct list_head next; +}; + struct vfio_group { struct iommu_group *iommu_group; struct list_headnext; @@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu, return NULL; } +/* This function must be called with iommu->lock held */ +static bool vfio_overlap_with_resvd_region(struct vfio_iommu *iommu, + dma_addr_t start, size_t size) +{ + struct vfio_resvd_region *region; + + list_for_each_entry(region, >reserved_iova_list, next) { + if (region->iova < start) + return (start - region->iova < region->size); + else if (start < region->iova) + return (region->iova - start < size); + + return (region->size > 0 && size > 0); + } + + return false; +} + +/* This function must be called with iommu->lock held */ +static +struct vfio_resvd_region *vfio_find_resvd_region(struct vfio_iommu *iommu, +dma_addr_t start, size_t size) +{ + struct vfio_resvd_region *region; + + list_for_each_entry(region, >reserved_iova_list, next) + if (region->iova == start && region->size == size) + return region; + + return NULL; +} + static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new) { struct rb_node **link = >dma_list.rb_node, *parent = NULL; @@ -580,7 +622,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, mutex_lock(>lock); - if (vfio_find_dma(iommu, iova, size)) { + if (vfio_find_dma(iommu, iova, size) || + vfio_overlap_with_resvd_region(iommu, iova, size)) { mutex_unlock(>lock); return -EEXIST; } @@ -626,6 +669,127 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, return ret; } +/* This function must be called with iommu->lock held */ +static +int vfio_iommu_resvd_region_del(struct vfio_iommu *iommu, + dma_addr_t iova, size_t size, int prot) +{ + struct vfio_resvd_region *res_region; + + res_region = vfio_find_resvd_region(iommu, iova, size); + /* Region should not be mapped in iommu */ + if (res_region == NULL || res_region->map_paddr) + return -EINVAL; + + list_del(_region->next); + kfree(res_region); + return 0; +} + +/* This function must be called with iommu->lock held */ +static int vfio_iommu_resvd_region_add(struct vfio_iommu *iommu, + dma_addr_t iova, size_t size, int prot) +{ + struct vfio_resvd_region *res_region; + + /* Check overlap with with dma maping and reserved regions */ + if (vfio_find_dma(iommu, iova, size) || + vfio_find_resvd_region(iommu, iova, size)) + return -EEXIST; + + res_region = kzalloc(sizeof(*res_region), GFP_KERNEL); + if (res_region == NULL) + return -ENOMEM; + + res_region->iova = iova; + res_region->size = size; + res_region->prot = prot; + res_region->refcount = 0; + res_region->map_paddr = 0; + + list_add(_region->next, >reserved_iova_list); + + return 0; +} + +static +int vfio_handle_reserved_region_add(struct vfio_iommu *iommu, + struct vfio_iommu_reserved_region_add *region) +{ + dma_addr_t iova = region->iova; + size_t size =
[RFC PATCH 2/6] iommu: Add interface to get msi-pages mapping attributes
This APIs return the capability of automatically mapping msi-pages in iommu with some magic iova. Which is what currently most of iommu's does and is the default behaviour. Further API returns whether iommu allows the user to define different iova for mai-page mapping for the domain. This is required when a msi capable device is directly assigned to user-space/VM and user-space/VM need to define a non-overlapping (from other dma-able address space) iova for msi-pages mapping in iommu. This patch just define the interface and follow up patches will extend this interface. Signed-off-by: Bharat Bhushan--- drivers/iommu/arm-smmu.c| 3 +++ drivers/iommu/fsl_pamu_domain.c | 3 +++ drivers/iommu/iommu.c | 14 ++ include/linux/iommu.h | 9 - 4 files changed, 28 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 66a803b..a3956fb 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1406,6 +1406,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; + case DOMAIN_ATTR_MSI_MAPPING: + /* Dummy handling added */ + return 0; default: return -ENODEV; } diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index 1d45293..9a94430 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -856,6 +856,9 @@ static int fsl_pamu_get_domain_attr(struct iommu_domain *domain, case DOMAIN_ATTR_FSL_PAMUV1: *(int *)data = DOMAIN_ATTR_FSL_PAMUV1; break; + case DOMAIN_ATTR_MSI_MAPPING: + /* Dummy handling added */ + break; default: pr_debug("Unsupported attribute type\n"); ret = -EINVAL; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index d4f527e..16c2eab 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1216,6 +1216,7 @@ int iommu_domain_get_attr(struct iommu_domain *domain, bool *paging; int ret = 0; u32 *count; + struct iommu_domain_msi_maps *msi_maps; switch (attr) { case DOMAIN_ATTR_GEOMETRY: @@ -1236,6 +1237,19 @@ int iommu_domain_get_attr(struct iommu_domain *domain, ret = -ENODEV; break; + case DOMAIN_ATTR_MSI_MAPPING: + msi_maps = data; + + /* Default MSI-pages are magically mapped with some iova and +* do now allow to configure with different iova. +*/ + msi_maps->automap = true; + msi_maps->override_automap = false; + + if (domain->ops->domain_get_attr) + ret = domain->ops->domain_get_attr(domain, attr, data); + + break; default: if (!domain->ops->domain_get_attr) return -EINVAL; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0546b87..6d49f3f 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -83,6 +83,13 @@ struct iommu_domain { struct iommu_domain_geometry geometry; }; +struct iommu_domain_msi_maps { + dma_addr_t base_address; + dma_addr_t size; + bool automap; + bool override_automap; +}; + enum iommu_cap { IOMMU_CAP_CACHE_COHERENCY, /* IOMMU can enforce cache coherent DMA transactions */ @@ -111,6 +118,7 @@ enum iommu_attr { DOMAIN_ATTR_FSL_PAMU_ENABLE, DOMAIN_ATTR_FSL_PAMUV1, DOMAIN_ATTR_NESTING,/* two stages of translation */ + DOMAIN_ATTR_MSI_MAPPING, /* Provides MSIs mapping in iommu */ DOMAIN_ATTR_MAX, }; @@ -167,7 +175,6 @@ struct iommu_ops { int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count); /* Get the numer of window per domain */ u32 (*domain_get_windows)(struct iommu_domain *domain); - #ifdef CONFIG_OF_IOMMU int (*of_xlate)(struct device *dev, struct of_phandle_args *args); #endif -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
An MSI-address is allocated and programmed in pcie device during interrupt configuration. Now for a pass-through device, try to create the iommu mapping for this allocted/programmed msi-address. If the iommu mapping is created and the msi address programmed in the pcie device is different from msi-iova as per iommu programming then reconfigure the pci device to use msi-iova as msi address. Signed-off-by: Bharat Bhushan--- drivers/vfio/pci/vfio_pci_intrs.c | 36 ++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 1f577b4..c9690af 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector; char *name = msix ? "vfio-msix" : "vfio-msi"; struct eventfd_ctx *trigger; + struct msi_msg msg; + struct vfio_device *device; + uint64_t msi_addr, msi_iova; int ret; if (vector >= vdev->num_ctx) return -EINVAL; + device = vfio_device_get_from_dev(>dev); + if (device == NULL) + return -EINVAL; + if (vdev->ctx[vector].trigger) { free_irq(irq, vdev->ctx[vector].trigger); + get_cached_msi_msg(irq, ); + msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo; + vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE); kfree(vdev->ctx[vector].name); eventfd_ctx_put(vdev->ctx[vector].trigger); vdev->ctx[vector].trigger = NULL; @@ -346,12 +356,11 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, * cached value of the message prior to enabling. */ if (msix) { - struct msi_msg msg; - get_cached_msi_msg(irq, ); pci_write_msi_msg(irq, ); } + ret = request_irq(irq, vfio_msihandler, 0, vdev->ctx[vector].name, trigger); if (ret) { @@ -360,6 +369,29 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, return ret; } + /* Re-program the new-iova in pci-device in case there is +* different iommu-mapping created for programmed msi-address. +*/ + get_cached_msi_msg(irq, ); + msi_iova = 0; + msi_addr = (u64)(msg.address_hi) << 32 | (u64)(msg.address_lo); + ret = vfio_device_map_msi(device, msi_addr, PAGE_SIZE, _iova); + if (ret) { + free_irq(irq, vdev->ctx[vector].trigger); + kfree(vdev->ctx[vector].name); + eventfd_ctx_put(trigger); + return ret; + } + + /* Reprogram only if iommu-mapped iova is different from msi-address */ + if (msi_iova && (msi_iova != msi_addr)) { + msg.address_hi = (u32)(msi_iova >> 32); + /* Keep Lower bits from original msi message address */ + msg.address_lo &= PAGE_MASK; + msg.address_lo |= (u32)(msi_iova & 0x); + pci_write_msi_msg(irq, ); + } + vdev->ctx[vector].trigger = trigger; return 0; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI
Finally ARM SMMU declare that iommu-mapping for MSI-pages are not set automatically and it should be set explicitly. Signed-off-by: Bharat Bhushan--- drivers/iommu/arm-smmu.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index a3956fb..9d37e72 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1401,13 +1401,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr attr, void *data) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct iommu_domain_msi_maps *msi_maps; switch (attr) { case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; case DOMAIN_ATTR_MSI_MAPPING: - /* Dummy handling added */ + msi_maps = data; + + msi_maps->automap = false; + msi_maps->override_automap = true; + return 0; default: return -ENODEV; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: PPC: e6500: Handle LRAT error exception
On 09/30/2015 01:32 PM, Laurentiu Tudor wrote: > On 09/25/2015 03:10 AM, Scott Wood wrote: >> On Thu, 2015-09-24 at 16:11 +0300, Laurentiu Tudor wrote: [snip] >>> b/arch/powerpc/kvm/e500_mmu_host.c >>> index 12d5c67..99ad88a 100644 >>> --- a/arch/powerpc/kvm/e500_mmu_host.c >>> +++ b/arch/powerpc/kvm/e500_mmu_host.c >>> @@ -96,6 +96,112 @@ static inline void __write_host_tlbe(struct >>> kvm_book3e_206_tlb_entry *stlbe, >>> stlbe->mas2, stlbe->mas7_3); >>> } >>> >>> +#if defined(CONFIG_64BIT) && defined(CONFIG_KVM_BOOKE_HV) >>> +static int lrat_next(void) >>> +{ >> >> Will anything break by removing the CONFIG_64BIT condition, even if we don't >> have a 32-bit target that uses this? > > Not completly certain but i remember getting compile or link errors > on 32-bit e500mc or e500v2. I can recheck if you want. > >>> +void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn) >>> +{ >>> + struct kvm_memory_slot *slot; >>> + unsigned long pfn; >>> + unsigned long hva; >>> + struct vm_area_struct *vma; >>> + unsigned long psize; >>> + int tsize; >>> + unsigned long tsize_pages; >>> + >>> + slot = gfn_to_memslot(vcpu->kvm, gfn); >>> + if (!slot) { >>> + pr_err_ratelimited("%s: couldn't find memslot for gfn %lx!\n", >>> +__func__, (long)gfn); >>> + return; >>> + } >>> + >>> + hva = slot->userspace_addr; >> >> What if the faulting address is somewhere in the middle of the slot? >> Shouldn't you use gfn_to_hva_memslot() like kvmppc_e500_shadow_map()? In >> fact there's probably a lot of logic that should be shared between these two >> functions. > > So if my understanding is correct most of the gfn -> pfn translation > stuff done in kvmppc_e500_shadow_map() should also be present in here. > If that's the case maybe i should first extract this code (which includes > VM_PFNMAP handling) in a separate function and call it from both > kvmppc_lrat_map() > and kvmppc_e500_shadow_map(). > Off-topic, but just noticed that kvmppc_e500_shadow_map() is marked as inline. Was that on purpose? Is inlining such a large function worth anything? --- Best Regards, Laurentiu -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] KVM: arm/arm64: BUG FIX: Do not inject spurious interrupts
Hello! > I reworked the commit message and applied this patch. Thank you very much. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
Hi Bharat, [auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore] config: i386-allmodconfig (attached as .config) reproduce: git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9 # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): >> ERROR: "vfio_device_map_msi" undefined! >> ERROR: "vfio_device_unmap_msi" undefined! --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH 1/2] KVM: PPC: e6500: Handle LRAT error exception
On 09/25/2015 03:10 AM, Scott Wood wrote: > On Thu, 2015-09-24 at 16:11 +0300, Laurentiu Tudor wrote: >> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S >> b/arch/powerpc/kvm/bookehv_interrupts.S >> index 81bd8a07..1e9fa2a 100644 >> --- a/arch/powerpc/kvm/bookehv_interrupts.S >> +++ b/arch/powerpc/kvm/bookehv_interrupts.S >> @@ -62,6 +62,7 @@ >> #define NEED_EMU 0x0001 /* emulation -- save nv regs */ >> #define NEED_DEAR0x0002 /* save faulting DEAR */ >> #define NEED_ESR 0x0004 /* save faulting ESR */ >> +#define NEED_LPER0x0008 /* save faulting LPER */ >> >> /* >> * On entry: >> @@ -159,6 +160,12 @@ >> PPC_STL r9, VCPU_FAULT_DEAR(r4) >> .endif >> >> + /* Only supported on 64-bit cores for now */ >> + .if \flags & NEED_LPER >> + mfspr r7, SPRN_LPER >> + std r7, VCPU_FAULT_LPER(r4) >> + .endif > > What's the harm in using PPC_STL anyway? Will do so. > >> /* >> * For input register values, see >> arch/powerpc/include/asm/kvm_booke_hv_asm.h >> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >> b/arch/powerpc/kvm/e500_mmu_host.c >> index 12d5c67..99ad88a 100644 >> --- a/arch/powerpc/kvm/e500_mmu_host.c >> +++ b/arch/powerpc/kvm/e500_mmu_host.c >> @@ -96,6 +96,112 @@ static inline void __write_host_tlbe(struct >> kvm_book3e_206_tlb_entry *stlbe, >> stlbe->mas2, stlbe->mas7_3); >> } >> >> +#if defined(CONFIG_64BIT) && defined(CONFIG_KVM_BOOKE_HV) >> +static int lrat_next(void) >> +{ > > Will anything break by removing the CONFIG_64BIT condition, even if we don't > have a 32-bit target that uses this? Not completly certain but i remember getting compile or link errors on 32-bit e500mc or e500v2. I can recheck if you want. >> +void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn) >> +{ >> + struct kvm_memory_slot *slot; >> + unsigned long pfn; >> + unsigned long hva; >> + struct vm_area_struct *vma; >> + unsigned long psize; >> + int tsize; >> + unsigned long tsize_pages; >> + >> + slot = gfn_to_memslot(vcpu->kvm, gfn); >> + if (!slot) { >> + pr_err_ratelimited("%s: couldn't find memslot for gfn %lx!\n", >> +__func__, (long)gfn); >> + return; >> + } >> + >> + hva = slot->userspace_addr; > > What if the faulting address is somewhere in the middle of the slot? > Shouldn't you use gfn_to_hva_memslot() like kvmppc_e500_shadow_map()? In > fact there's probably a lot of logic that should be shared between these two > functions. So if my understanding is correct most of the gfn -> pfn translation stuff done in kvmppc_e500_shadow_map() should also be present in here. If that's the case maybe i should first extract this code (which includes VM_PFNMAP handling) in a separate function and call it from both kvmppc_lrat_map() and kvmppc_e500_shadow_map(). >> + down_read(>mm->mmap_sem); >> + vma = find_vma(current->mm, hva); >> + if (vma && (hva >= vma->vm_start)) { >> + psize = vma_kernel_pagesize(vma); > > What if it's VM_PFNMAP? > >> + } else { >> + pr_err_ratelimited("%s: couldn't find virtual memory address >> for gfn >> %lx!\n", >> +__func__, (long)gfn); >> + up_read(>mm->mmap_sem); >> + return; >> + } >> + up_read(>mm->mmap_sem); >> + >> + pfn = gfn_to_pfn_memslot(slot, gfn); >> + if (is_error_noslot_pfn(pfn)) { >> + pr_err_ratelimited("%s: couldn't get real page for gfn %lx!\n", >> +__func__, (long)gfn); >> + return; >> + } >> + >> + tsize = __ilog2(psize) - 10; >> + tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT); > > 1UL << ... > > kvmppc_e500_shadow_map needs the same fix. I'll make a distinct patch with the kvmppc_e500_shadow_map() fix. >> + gfn &= ~(tsize_pages - 1); >> + pfn &= ~(tsize_pages - 1); >> + >> + write_host_lrate(tsize, gfn, pfn, vcpu->kvm->arch.lpid, true); >> + >> + kvm_release_pfn_clean(pfn); >> +} >> + >> +void kvmppc_lrat_invalidate(struct kvm_vcpu *vcpu) >> +{ >> + uint32_t mas0, mas1 = 0; >> + int esel; >> + unsigned long flags; >> + >> + local_irq_save(flags); >> + >> + /* LRAT does not have a dedicated instruction for invalidation */ >> + for (esel = 0; esel < get_paca()->tcd_ptr->lrat_max; esel++) { >> + mas0 = MAS0_ATSEL | MAS0_ESEL(esel); >> + mtspr(SPRN_MAS0, mas0); >> + asm volatile("isync; tlbre" : : : "memory"); >> + mas1 = mfspr(SPRN_MAS1) & ~MAS1_VALID; >> + mtspr(SPRN_MAS1, mas1); >> + asm volatile("isync; tlbwe" : : : "memory"); >> + } >> + /* Must clear mas8 for other host tlbwe's */ >> + mtspr(SPRN_MAS8, 0); >> + isync(); >> + >> + local_irq_restore(flags); >> +} >> +#endif /* CONFIG_64BIT
Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
Hi Bharat, [auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore] config: x86_64-rhel (attached as .config) reproduce: git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9 # save the attached .config to linux build tree make ARCH=x86_64 All error/warnings (new ones prefixed by >>): >> ERROR: "vfio_device_map_msi" [drivers/vfio/pci/vfio-pci.ko] undefined! >> ERROR: "vfio_device_unmap_msi" [drivers/vfio/pci/vfio-pci.ko] undefined! --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
[RFC PATCH v5 2/3] vfio: platform: access device property as a list of strings
From: Antonios MotakisCertain device properties (e.g. the device node name, the compatible string), are available as a list of strings (separated by the null terminating character). Let the VFIO user query this type of properties. Signed-off-by: Antonios Motakis Signed-off-by: Baptiste Reynal --- v4 -> v5: - return ENOSPC when the buffer size is too small - remove strlen call v3 -> v4: - The list length is computed before strings copy. If the entire list doesn't fit, no strings are copied to the user. --- drivers/vfio/platform/properties.c | 43 +- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/platform/properties.c b/drivers/vfio/platform/properties.c index 48c90c5..212755f 100644 --- a/drivers/vfio/platform/properties.c +++ b/drivers/vfio/platform/properties.c @@ -22,7 +22,48 @@ static int dev_property_get_strings(struct device *dev, char *name, unsigned *lenp, void __user *datap, unsigned long datasz) { - return -EINVAL; + const char **val; + int n, i, ret; + + if (lenp == NULL) + return -EFAULT; + + *lenp = 0; + + n = device_property_read_string_array(dev, name, NULL, 0); + if (n < 0) + return n; + + val = kcalloc(n, sizeof(char *), GFP_KERNEL); + if (!val) + return -ENOMEM; + + ret = device_property_read_string_array(dev, name, val, n); + if (ret < 0) + goto out; + + for (i = 0; i < n; i++) + *lenp += strlen(val[i]) + 1; + + if (datasz < *lenp) { + ret = -ENOSPC; + goto out; + } + + for (i = 0; i < n; i++) { + size_t len = strlen(val[i]) + 1; + + if (copy_to_user(datap, val[i], len)) { + ret = -EFAULT; + goto out; + } + + datap += len; + } + +out: + kfree(val); + return ret; } static int dev_property_get_uint(struct device *dev, -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v5 3/3] vfio: platform: return device properties as arrays of unsigned integers
From: Antonios MotakisCertain properties of a device are accessible as an array of unsigned integers, either u64, u32, u16, or u8. Let the VFIO user query this type of device properties. Signed-off-by: Antonios Motakis Signed-off-by: Baptiste Reynal --- v4 -> v5: - fix return error when the buffer size is too small --- drivers/vfio/platform/properties.c | 62 +- 1 file changed, 61 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/platform/properties.c b/drivers/vfio/platform/properties.c index 212755f..a4b6955 100644 --- a/drivers/vfio/platform/properties.c +++ b/drivers/vfio/platform/properties.c @@ -70,7 +70,67 @@ static int dev_property_get_uint(struct device *dev, char *name, uint32_t type, unsigned *lenp, void __user *datap, unsigned long datasz) { - return -EINVAL; + int ret, n; + u8 *out; + size_t sz; + int (*func)(const struct device *, const char *, void *, size_t) + = NULL; + + switch (type) { + case VFIO_DEV_PROPERTY_TYPE_U64: + sz = sizeof(u64); + func = (int (*)(const struct device *, + const char *, void *, size_t)) + device_property_read_u64_array; + break; + case VFIO_DEV_PROPERTY_TYPE_U32: + sz = sizeof(u32); + func = (int (*)(const struct device *, + const char *, void *, size_t)) + device_property_read_u32_array; + break; + case VFIO_DEV_PROPERTY_TYPE_U16: + sz = sizeof(u16); + func = (int (*)(const struct device *, + const char *, void *, size_t)) + device_property_read_u16_array; + break; + case VFIO_DEV_PROPERTY_TYPE_U8: + sz = sizeof(u8); + func = (int (*)(const struct device *, + const char *, void *, size_t)) + device_property_read_u8_array; + break; + + default: + return -EINVAL; + } + + /* get size of array */ + n = func(dev, name, NULL, 0); + if (n < 0) + return n; + + if (lenp) + *lenp = n * sz; + + if (n * sz > datasz) + return -ENOSPC; + + out = kcalloc(n, sz, GFP_KERNEL); + if (!out) + return -ENOMEM; + + ret = func(dev, name, out, n); + if (ret) + goto out; + + if (copy_to_user(datap, out, n * sz)) + ret = -EFAULT; + +out: + kfree(out); + return ret; } int vfio_platform_dev_properties(struct device *dev, -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
> -Original Message- > From: kbuild test robot [mailto:l...@intel.com] > Sent: Wednesday, September 30, 2015 4:33 PM > To: Bhushan Bharat-R65777> Cc: kbuild-...@01.org; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; > alex.william...@redhat.com; christoffer.d...@linaro.org; > eric.au...@linaro.org; pranavku...@linaro.org; marc.zyng...@arm.com; > will.dea...@arm.com; Bhushan Bharat-R65777 > > Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi > interrupt > > Hi Bharat, > > [auto build test results on v4.3-rc3 -- if it's inappropriate base, please > ignore] > > config: x86_64-rhel (attached as .config) > reproduce: > git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9 > # save the attached .config to linux build tree > make ARCH=x86_64 > > All error/warnings (new ones prefixed by >>): > > >> ERROR: "vfio_device_map_msi" [drivers/vfio/pci/vfio-pci.ko] undefined! > >> ERROR: "vfio_device_unmap_msi" [drivers/vfio/pci/vfio-pci.ko] > undefined! Yes, this is the problem, I will correct this in next version. Thanks -Bharat > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote: > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnarwrote: > > > > Linus, what's your preference? > > So quite frankly, is there any reason we don't just implement > native_read_msr() as just > >unsigned long long native_read_msr(unsigned int msr) >{ > int err; > unsigned long long val; > > val = native_read_msr_safe(msr, ); > WARN_ON_ONCE(err); > return val; >} > > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be > done with it. I don't see the downside. > > How many msr reads are so critical that the function call > overhead would matter? Get rid of the inline version of the _safe() > thing too, and put that thing there too. There are a few in the perf code, and esp. on cores without a stack engine the call overhead is noticeable. Also note that the perf MSRs are generally optimized MSRs and less slow (we cannot say fast, they're still MSRs) than regular MSRs. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Peter Zijlstrawrote: > On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote: > > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar wrote: > > > > > > Linus, what's your preference? > > > > So quite frankly, is there any reason we don't just implement > > native_read_msr() as just > > > >unsigned long long native_read_msr(unsigned int msr) > >{ > > int err; > > unsigned long long val; > > > > val = native_read_msr_safe(msr, ); > > WARN_ON_ONCE(err); > > return val; > >} > > > > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be > > done with it. I don't see the downside. > > > > How many msr reads are so critical that the function call > > overhead would matter? Get rid of the inline version of the _safe() > > thing too, and put that thing there too. > > There are a few in the perf code, and esp. on cores without a stack engine > the > call overhead is noticeable. Also note that the perf MSRs are generally > optimized MSRs and less slow (we cannot say fast, they're still MSRs) than > regular MSRs. These could still be open coded in an inlined fashion, like the scheduler usage. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 07/11] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
Hi Christoffer, On 29/09/15 14:44, Christoffer Dall wrote: > On Wed, Sep 23, 2015 at 06:55:04PM +0100, Andre Przywara wrote: >> Salut Marc, >> >> I know that this patch is already merged, but >> >> On 07/08/15 16:45, Marc Zyngier wrote: >>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c >>> index 51c9900..9d009d2 100644 >> ... >>> @@ -1364,6 +1397,39 @@ static bool vgic_process_maintenance(struct kvm_vcpu >>> *vcpu) >>> return level_pending; >>> } >>> >>> +/* >>> + * Save the physical active state, and reset it to inactive. >>> + * >>> + * Return 1 if HW interrupt went from active to inactive, and 0 otherwise. >>> + */ >>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr) >>> +{ >>> + struct irq_phys_map *map; >>> + int ret; >>> + >>> + if (!(vlr.state & LR_HW)) >>> + return 0; >>> + >>> + map = vgic_irq_map_search(vcpu, vlr.irq); >>> + BUG_ON(!map || !map->active); >>> + >>> + ret = irq_get_irqchip_state(map->irq, >>> + IRQCHIP_STATE_ACTIVE, >>> + >active); >>> + >>> + WARN_ON(ret); >>> + >>> + if (map->active) { >>> + ret = irq_set_irqchip_state(map->irq, >>> + IRQCHIP_STATE_ACTIVE, >>> + false); >>> + WARN_ON(ret); >>> + return 0; >>> + } >>> + >>> + return 1; >>> +} >>> + >>> /* Sync back the VGIC state after a guest run */ >>> static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu) >>> { >>> @@ -1378,14 +1444,31 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu >>> *vcpu) >>> elrsr = vgic_get_elrsr(vcpu); >>> elrsr_ptr = u64_to_bitmask(); >>> >>> - /* Clear mappings for empty LRs */ >>> - for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) { >>> + /* Deal with HW interrupts, and clear mappings for empty LRs */ >>> + for (lr = 0; lr < vgic->nr_lr; lr++) { >>> struct vgic_lr vlr; >>> >>> - if (!test_and_clear_bit(lr, vgic_cpu->lr_used)) >>> + if (!test_bit(lr, vgic_cpu->lr_used)) >>> continue; >>> >>> vlr = vgic_get_lr(vcpu, lr); >>> + if (vgic_sync_hwirq(vcpu, vlr)) { >>> + /* >>> +* So this is a HW interrupt that the guest >>> +* EOI-ed. Clean the LR state and allow the >>> +* interrupt to be sampled again. >>> +*/ >>> + vlr.state = 0; >>> + vlr.hwirq = 0; >>> + vgic_set_lr(vcpu, lr, vlr); >>> + vgic_irq_clear_queued(vcpu, vlr.irq); >> >> Isn't this line altering common VGIC state without holding the lock? >> Eric removed the coarse dist->lock around the whole >> __kvm_vgic_sync_hwstate() function, we take it now in >> vgic_process_maintenance(), but don't hold it here AFAICT. >> As long as we are only dealing with private timer IRQs this is probably >> not a problem, but the IRQ number could be a SPI as well, right? >> > I don't see a problematic race with this though, as all we're doing is > to clear a bit in a bitmap, which is always checked atomically, so > adding a lock around this really doesn't change anything as far as I can > tell. Indeed I found a similar comment in some older revisions of the code. But isn't it that other code holding the lock (thinking about kvm_vgic_flush_hwstate() in particular) assumes that no-one else tinkers with the VGIC state while it holds the lock? So couldn't we (potentially) run into inconsistent state because we cleared the queued bit while the flushing code runs over all interrupts? Maybe not in this particular case, but in general? Haven't looked at your new series yet, but will do this ASAP. Cheers, Andre. > > Nevertheless, my rework series also addresses this. > > -Christoffer > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvmtool] Skip a few messages by default: command line args; flat binary; earlyprintk.
Hi Dimitri, thanks for sharing your patches. On 29/09/15 17:59, Dimitri John Ledkov wrote: > The partial command line args & earlyprintk=serial are still enabled > in the debug mode. Warning that a flat binary kernel image is attemped > to be loaded is completely dropped. These are not that informative, > once one is past intial debugging, and only polute the console. > > Signed-off-by: Dimitri John Ledkov> --- > builtin-run.c | 10 ++ > kvm.c | 1 - > x86/kvm.c | 8 ++-- > 3 files changed, 12 insertions(+), 7 deletions(-) > > diff --git a/builtin-run.c b/builtin-run.c > index e0c8732..8edbf88 100644 > --- a/builtin-run.c > +++ b/builtin-run.c > @@ -613,10 +613,12 @@ static struct kvm *kvm_cmd_run_init(int argc, const > char **argv) > > kvm->cfg.real_cmdline = real_cmdline; > > - printf(" # %s run -k %s -m %Lu -c %d --name %s\n", KVM_BINARY_NAME, > - kvm->cfg.kernel_filename, > - (unsigned long long)kvm->cfg.ram_size / 1024 / 1024, > - kvm->cfg.nrcpus, kvm->cfg.guest_name); > + if (do_debug_print) { > + printf(" # %s run -k %s -m %Lu -c %d --name %s\n", > KVM_BINARY_NAME, > +kvm->cfg.kernel_filename, > +(unsigned long long)kvm->cfg.ram_size / 1024 / 1024, > +kvm->cfg.nrcpus, kvm->cfg.guest_name); > + } I like the general idea. In fact I have this very patch (among others) in my tree too. I applied similar guarding to other messages as well (mostly those that only show up on ARM, but also the "ended normally" message). Like any good UNIX tool kvmtool should keep quiet if it has nothing worthwhile to say ;-) But looking at it more closely, I see that there is pr_debug() defined doing that "if (do_debug_print)" already. The only issue is that is prints source line information, which is not really useful here. But then again there does not seem to be any user of it? So what about the following: - We avoid printing pr_info() messages in the default case. Looking at its current users in the tree this information is not really useful for normal users. We enable pr_info() output only if do_debug_print is enabled or introduce another command line flag (--verbose?) for that. - We check each user of pr_info() to see whether this information is actually "informational" or whether it should be converted to pr_warn. - We change the above line to use pr_info instead of printf. - We fix the EOL mayhem we have atm while at it. If you don't mind I will give this a try later this week. > > if (init_list__init(kvm) < 0) > die ("Initialisation failed"); > diff --git a/kvm.c b/kvm.c > index 10ed230..1081072 100644 > --- a/kvm.c > +++ b/kvm.c > @@ -378,7 +378,6 @@ bool kvm__load_kernel(struct kvm *kvm, const char > *kernel_filename, > if (ret) > goto found_kernel; > > - pr_warning("%s is not a bzImage. Trying to load it as a flat > binary...", kernel_filename); I think on x86 this message is useful to have: to point people to the fact that they are trying to load a kernel which most probably isn't one. Do you actually load a "flat binary", so not a Linux bzImage? If yes, what is it? Does this work for you? I didn't have the impression that this code was actually used at all. If you do use it, could you please give my kernel loading series [1] a try? I touch flat binary loading there, but had no chance to test it. > #endif > > ret = load_elf_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline); > diff --git a/x86/kvm.c b/x86/kvm.c > index 512ad67..4a5fa41 100644 > --- a/x86/kvm.c > +++ b/x86/kvm.c > @@ -124,8 +124,12 @@ void kvm__arch_set_cmdline(char *cmdline, bool video) > "i8042.dumbkbd=1 i8042.nopnp=1"); > if (video) > strcat(cmdline, " video=vesafb console=tty0"); > - else > - strcat(cmdline, " console=ttyS0 earlyprintk=serial > i8042.noaux=1"); > + else { > + strcat(cmdline, " console=ttyS0 i8042.noaux=1"); > + if (do_debug_print) { > + strcat(cmdline, " earlyprintk=serial"); > + } > + } I am not completely convinced of this one. The do_debug_print is meant to affect kvmtool's own debug output only and should really have no effect on the guest's kernel output, shouldn't it? Maybe we should clarify the semantics in the documentation? Cheers, Andre. [1] http://marc.info/?l=kvm=143825354808135 > } > > /* Architecture-specific KVM init */ > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvmtool] kvmtool: expose the TSC Deadline Timer feature to the guest
Hi Dimitri, On 29/09/15 18:00, Dimitri John Ledkov wrote: > From: Arjan van de Ven> > with the TSC deadline timer feature, we don't need to calibrate the apic > timers anymore, which saves more than 100 milliseconds of boot time. > > Signed-off-by: Arjan van de Ven > Signed-off-by: Dimitri John Ledkov > --- > x86/cpuid.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/x86/cpuid.c b/x86/cpuid.c > index c3b67d9..1d8bd23 100644 > --- a/x86/cpuid.c > +++ b/x86/cpuid.c > @@ -31,6 +31,9 @@ static void filter_cpuid(struct kvm_cpuid2 *kvm_cpuid) > /* Set X86_FEATURE_HYPERVISOR */ > if (entry->index == 0) > entry->ecx |= (1 << 31); > +/* Set CPUID_EXT_TSC_DEADLINE_TIMER*/ > + if (entry->index == 0) > + entry->ecx |= (1 << 24); This can only be enabled if the kernel supports emulation of that feature (reported via KVM_CAP_TSC_DEADLINE_TIMER) (cf: Documentation/virtual/kvm/api.txt and respective QEMU code in target-i386/kvm.c) Cheers, Andre. > break; > case 6: > /* Clear X86_FEATURE_EPB */ > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: PPC: e6500: Handle LRAT error exception
On Wed, 2015-09-30 at 14:27 +0300, Laurentiu Tudor wrote: > On 09/30/2015 01:32 PM, Laurentiu Tudor wrote: > > On 09/25/2015 03:10 AM, Scott Wood wrote: > > > On Thu, 2015-09-24 at 16:11 +0300, Laurentiu Tudor wrote: > > [snip] > > > > > b/arch/powerpc/kvm/e500_mmu_host.c > > > > index 12d5c67..99ad88a 100644 > > > > --- a/arch/powerpc/kvm/e500_mmu_host.c > > > > +++ b/arch/powerpc/kvm/e500_mmu_host.c > > > > @@ -96,6 +96,112 @@ static inline void __write_host_tlbe(struct > > > > kvm_book3e_206_tlb_entry *stlbe, > > > > stlbe->mas2, stlbe->mas7_3); > > > > } > > > > > > > > +#if defined(CONFIG_64BIT) && defined(CONFIG_KVM_BOOKE_HV) > > > > +static int lrat_next(void) > > > > +{ > > > > > > Will anything break by removing the CONFIG_64BIT condition, even if we > > > don't > > > have a 32-bit target that uses this? > > > > Not completly certain but i remember getting compile or link errors > > on 32-bit e500mc or e500v2. I can recheck if you want. > > > > > > +void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn) > > > > +{ > > > > + struct kvm_memory_slot *slot; > > > > + unsigned long pfn; > > > > + unsigned long hva; > > > > + struct vm_area_struct *vma; > > > > + unsigned long psize; > > > > + int tsize; > > > > + unsigned long tsize_pages; > > > > + > > > > + slot = gfn_to_memslot(vcpu->kvm, gfn); > > > > + if (!slot) { > > > > + pr_err_ratelimited("%s: couldn't find memslot for gfn > > > > %lx!\n", > > > > +__func__, (long)gfn); > > > > + return; > > > > + } > > > > + > > > > + hva = slot->userspace_addr; > > > > > > What if the faulting address is somewhere in the middle of the slot? > > > Shouldn't you use gfn_to_hva_memslot() like kvmppc_e500_shadow_map()? > > > In > > > fact there's probably a lot of logic that should be shared between > > > these two > > > functions. > > > > So if my understanding is correct most of the gfn -> pfn translation > > stuff done in kvmppc_e500_shadow_map() should also be present in here. > > If that's the case maybe i should first extract this code (which includes > > VM_PFNMAP handling) in a separate function and call it from both > > kvmppc_lrat_map() > > and kvmppc_e500_shadow_map(). > > > > Off-topic, but just noticed that kvmppc_e500_shadow_map() is marked as > inline. > Was that on purpose? Is inlining such a large function worth anything? I don't remember the intent. It was probably a lot smaller back then. That said, it's only used two places, with probably pretty good temporal separation between performance-intensive uses of one versus the other (so not a huge icache concern), and a pretty good portion of the function will be optimized out in the caller with tlbsel == 0. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html