Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-30 Thread H. Peter Anvin
On 09/21/2015 09:36 AM, Linus Torvalds wrote:
> 
> How many msr reads are so critical that the function call
> overhead would matter? Get rid of the inline version of the _safe()
> thing too, and put that thing there too.
> 

Probably only the ones that may go in the context switch path.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-30 Thread Andy Lutomirski
On Wed, Sep 30, 2015 at 7:01 AM, Ingo Molnar  wrote:
>
> * Peter Zijlstra  wrote:
>
>> On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote:
>> > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar  wrote:
>> > >
>> > > Linus, what's your preference?
>> >
>> > So quite frankly, is there any reason we don't just implement
>> > native_read_msr() as just
>> >
>> >unsigned long long native_read_msr(unsigned int msr)
>> >{
>> >   int err;
>> >   unsigned long long val;
>> >
>> >   val = native_read_msr_safe(msr, );
>> >   WARN_ON_ONCE(err);
>> >   return val;
>> >}
>> >
>> > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
>> > done with it. I don't see the downside.
>> >
>> > How many msr reads are so critical that the function call
>> > overhead would matter? Get rid of the inline version of the _safe()
>> > thing too, and put that thing there too.
>>
>> There are a few in the perf code, and esp. on cores without a stack engine 
>> the
>> call overhead is noticeable. Also note that the perf MSRs are generally
>> optimized MSRs and less slow (we cannot say fast, they're still MSRs) than
>> regular MSRs.
>
> These could still be open coded in an inlined fashion, like the scheduler 
> usage.
>

We could have a raw_rdmsr for those.

OTOH, I'm still not 100% convinced that this warn-but-don't-die
behavior is worth the effort.  This isn't a frequent source of bugs to
my knowledge, and we don't try to recover from incorrect cr writes,
out-of-bounds MMIO, etc, so do we really gain much by rigging a
recovery mechanism for rdmsr and wrmsr failures for code that doesn't
use the _safe variants?

--Andy

> Thanks,
>
> Ingo



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/3] target-i386: add a subsection of vcpu's TSC rate in vmstate_x86_cpu

2015-09-30 Thread Dr. David Alan Gilbert
* Haozhong Zhang (haozhong.zh...@intel.com) wrote:
> On Tue, Sep 29, 2015 at 08:00:13PM +0100, Dr. David Alan Gilbert wrote:
> > * Haozhong Zhang (haozhong.zh...@intel.com) wrote:
> > > The newly added subsection 'vmstate_tsc_khz' in this patch results in
> > > vcpu's TSC rate being saved on the source machine and loaded on the
> > > target machine during the migration.
> > > 
> > > Signed-off-by: Haozhong Zhang 
> > 
> > Hi,
> >   I'd appreciate it if you could tie this to only do it on newer
> > machine types; that way it won't break back migration.
> >
> 
> Does "back migration" mean migrating from QEMU w/ vmstate_tsc_khz
> subsection to QEMU w/o that subsection?

Yes; like if we migrate from a newer qemu to an older qemu but with
the same machine type.

Dave

> 
> - Haozhong
> 
> > Dave
> > 
> > > ---
> > >  target-i386/machine.c | 20 
> > >  1 file changed, 20 insertions(+)
> > > 
> > > diff --git a/target-i386/machine.c b/target-i386/machine.c
> > > index 9fa0563..80108a3 100644
> > > --- a/target-i386/machine.c
> > > +++ b/target-i386/machine.c
> > > @@ -752,6 +752,25 @@ static const VMStateDescription vmstate_xss = {
> > >  }
> > >  };
> > >  
> > > +static bool tsc_khz_needed(void *opaque)
> > > +{
> > > +X86CPU *cpu = opaque;
> > > +CPUX86State *env = >env;
> > > +
> > > +return env->tsc_khz != 0;
> > > +}
> > > +
> > > +static const VMStateDescription vmstate_tsc_khz = {
> > > +.name = "cpu/tsc_khz",
> > > +.version_id = 1,
> > > +.minimum_version_id = 1,
> > > +.needed = tsc_khz_needed,
> > > +.fields = (VMStateField[]) {
> > > +VMSTATE_INT64(env.tsc_khz, X86CPU),
> > > +VMSTATE_END_OF_LIST()
> > > +}
> > > +};
> > > +
> > >  VMStateDescription vmstate_x86_cpu = {
> > >  .name = "cpu",
> > >  .version_id = 12,
> > > @@ -871,6 +890,7 @@ VMStateDescription vmstate_x86_cpu = {
> > >  _msr_hyperv_crash,
> > >  _avx512,
> > >  _xss,
> > > +_tsc_khz,
> > >  NULL
> > >  }
> > >  };
> > > -- 
> > > 2.4.8
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm-all: notice KVM of vcpu's TSC rate after migration

2015-09-30 Thread Eduardo Habkost
On Wed, Sep 30, 2015 at 08:32:26AM +0800, Haozhong Zhang wrote:
> > [...]
> > > > Or maybe we shouldn't treat this as VM state, but as configuration, and
> > > > let management configure the TSC frequency explicitly if the user really
> > > > needs it to stay the same during migration.
> > > >
> > > > (CCing libvir-list to see if they have feedback)
> > > >
> > > 
> > > Thanks for CC. I'll consider to add a command line option to control
> > > the configuration of guest TSC fequency.
> > 
> > It already exists, -cpu has a "tsc-freq" option.
> >
> 
> What I'm considering is to add a "-keep-tsc-freq" option to control
> whether the TSC frequency should be migrated if "tsc-freq" is not
> presented. Compared to "tsc-freq", "-keep-tsc-freq" can free users
> from figuring out the guest TSC frequency manually in the migration.

If you do that, please make it a property of the CPU object, so it will
can be set as a "-cpu" option.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages

2015-09-30 Thread Bharat Bhushan
For MSI interrupts to work for a pass-through devices we need
to have mapping of msi-pages in iommu. Now on some platforms
(like x86) does this msi-pages mapping happens magically and in these
case they chooses an iova which they somehow know that it will never
overlap with guest memory. But this magic iova selection
may not be always true for all platform (like PowerPC and ARM64).

Also on x86 platform, there is no problem as long as running a x86-guest
on x86-host but there can be issues when running a non-x86 guest on
x86 host or other userspace applications like (I think ODP/DPDK).
As in these cases there can be chances that it overlaps with guest
memory mapping.

This patch add interface to iommu-map and iommu-unmap msi-pages at
reserved iova chosen by userspace.

Signed-off-by: Bharat Bhushan 
---
 drivers/vfio/vfio.c |  52 +++
 drivers/vfio/vfio_iommu_type1.c | 111 
 include/linux/vfio.h|   9 +++-
 3 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 2fb29df..a817d2d 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct notifier_block 
*nb,
return NOTIFY_OK;
 }
 
+int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
+   uint32_t size, uint64_t *msi_iova)
+{
+   struct vfio_container *container = device->group->container;
+   struct vfio_iommu_driver *driver;
+   int ret;
+
+   /* Validate address and size */
+   if (!msi_addr || !size || !msi_iova)
+   return -EINVAL;
+
+   down_read(>group_lock);
+
+   driver = container->iommu_driver;
+   if (!driver || !driver->ops || !driver->ops->msi_map) {
+   up_read(>group_lock);
+   return -EINVAL;
+   }
+
+   ret = driver->ops->msi_map(container->iommu_data,
+  msi_addr, size, msi_iova);
+
+   up_read(>group_lock);
+   return ret;
+}
+
+int vfio_device_unmap_msi(struct vfio_device *device, uint64_t msi_iova,
+ uint64_t size)
+{
+   struct vfio_container *container = device->group->container;
+   struct vfio_iommu_driver *driver;
+   int ret;
+
+   /* Validate address and size */
+   if (!msi_iova || !size)
+   return -EINVAL;
+
+   down_read(>group_lock);
+
+   driver = container->iommu_driver;
+   if (!driver || !driver->ops || !driver->ops->msi_unmap) {
+   up_read(>group_lock);
+   return -EINVAL;
+   }
+
+   ret = driver->ops->msi_unmap(container->iommu_data,
+msi_iova, size);
+
+   up_read(>group_lock);
+   return ret;
+}
+
 /**
  * VFIO driver API
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3315fb6..ab376c2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1003,12 +1003,34 @@ out_free:
return ret;
 }
 
+static void vfio_iommu_unmap_all_reserved_regions(struct vfio_iommu *iommu)
+{
+   struct vfio_resvd_region *region;
+   struct vfio_domain *d;
+
+   list_for_each_entry(region, >reserved_iova_list, next) {
+   list_for_each_entry(d, >domain_list, next) {
+   if (!region->map_paddr)
+   continue;
+
+   if (!iommu_iova_to_phys(d->domain, region->iova))
+   continue;
+
+   iommu_unmap(d->domain, region->iova, PAGE_SIZE);
+   region->map_paddr = 0;
+   cond_resched();
+   }
+   }
+}
+
 static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
 {
struct rb_node *node;
 
while ((node = rb_first(>dma_list)))
vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
+
+   vfio_iommu_unmap_all_reserved_regions(iommu);
 }
 
 static void vfio_iommu_type1_detach_group(void *iommu_data,
@@ -1048,6 +1070,93 @@ done:
mutex_unlock(>lock);
 }
 
+static int vfio_iommu_type1_msi_map(void *iommu_data, uint64_t msi_addr,
+   uint64_t size, uint64_t *msi_iova)
+{
+   struct vfio_iommu *iommu = iommu_data;
+   struct vfio_resvd_region *region;
+   int ret;
+
+   mutex_lock(>lock);
+
+   /* Do not try ceate iommu-mapping if msi reconfig not allowed */
+   if (!iommu->allow_msi_reconfig) {
+   mutex_unlock(>lock);
+   return 0;
+   }
+
+   /* Check if there is already region mapping the msi page */
+   list_for_each_entry(region, >reserved_iova_list, next) {
+   if (region->map_paddr == msi_addr) {
+   *msi_iova = region->iova;
+   region->refcount++;
+  

[RFC PATCH 0/6] vfio: Add interface to map MSI pages

2015-09-30 Thread Bharat Bhushan
This patch series add the interface to map MSI pages in iommu
for msi-capable device pass-through using vfio.

First patch adds a generic interface to set reserved iova regions.
These reserved regions can be used for mapping physical address.
Follow-up patches uses these reserved iova for mapping msi-pages.

This patch series does not provide interface to let user-space know
how many minimum reserved iova regions are required on a given platform.
This interface can be added once this patches series get reviewed and will
be in acceptable state.

Bharat Bhushan (6):
  vfio: Add interface for add/del reserved iova region
  iommu: Add interface to get msi-pages mapping attributes
  vfio: Extend iommu-info to return MSIs automap state
  vfio: Add interface to iommu-map/unmap MSI pages
  vfio-pci: Create iommu mapping for msi interrupt
  arm-smmu: Allow to set iommu mapping for MSI

 drivers/iommu/arm-smmu.c  |   8 +
 drivers/iommu/fsl_pamu_domain.c   |   3 +
 drivers/iommu/iommu.c |  14 ++
 drivers/vfio/pci/vfio_pci_intrs.c |  36 +++-
 drivers/vfio/vfio.c   |  52 ++
 drivers/vfio/vfio_iommu_type1.c   | 344 +-
 include/linux/iommu.h |   9 +-
 include/linux/vfio.h  |   9 +-
 include/uapi/linux/vfio.h |  46 +
 9 files changed, 516 insertions(+), 5 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC 0/2] VFIO: Add virtual MSI doorbell support.

2015-09-30 Thread Bhushan Bharat


> -Original Message-
> From: Christoffer Dall [mailto:christoffer.d...@linaro.org]
> Sent: Friday, September 25, 2015 10:42 PM
> To: Marc Zyngier 
> Cc: Bhushan Bharat-R65777 ; Pranavkumar
> Sawargaonkar ; kvm@vger.kernel.org; Alex
> Williamson ; kvm...@lists.cs.columbia.edu;
> linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; Will
> Deacon ; bhelg...@google.com; a...@arndb.de;
> rob.herr...@linaro.org; eric.au...@linaro.org; patc...@apm.com; Yoder
> Stuart-B08248 
> Subject: Re: [RFC 0/2] VFIO: Add virtual MSI doorbell support.
> 
> On Tue, Sep 22, 2015 at 11:09:14PM +0100, Marc Zyngier wrote:
> > On Tue, 4 Aug 2015 06:52:01 +0100
> > Bhushan Bharat  wrote:
> >
> > >
> > >
> > > > -Original Message-
> > > > From: Pranavkumar Sawargaonkar [mailto:pranavku...@linaro.org]
> > > > Sent: Tuesday, August 04, 2015 11:18 AM
> > > > To: Bhushan Bharat-R65777
> > > > Cc: kvm@vger.kernel.org; Alex Williamson;
> > > > kvm...@lists.cs.columbia.edu;
> > > > linux-arm-ker...@lists.infradead.org;
> > > > linux-ker...@vger.kernel.org; christoffer.d...@linaro.org;
> > > > marc.zyng...@arm.com; will.dea...@arm.com; bhelg...@google.com;
> > > > a...@arndb.de; rob.herr...@linaro.org; eric.au...@linaro.org;
> > > > patc...@apm.com; Yoder Stuart-B08248
> > > > Subject: Re: [RFC 0/2] VFIO: Add virtual MSI doorbell support.
> > > >
> > > > Hi Bharat,
> > > >
> > > > On 28 July 2015 at 23:28, Alex Williamson
> > > > 
> > > > wrote:
> > > > > On Tue, 2015-07-28 at 17:23 +, Bhushan Bharat wrote:
> > > > >> Hi Alex,
> > > > >>
> > > > >> > -Original Message-
> > > > >> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > >> > Sent: Tuesday, July 28, 2015 9:52 PM
> > > > >> > To: Pranavkumar Sawargaonkar
> > > > >> > Cc: kvm@vger.kernel.org; kvm...@lists.cs.columbia.edu;
> > > > >> > linux-arm- ker...@lists.infradead.org;
> > > > >> > linux-ker...@vger.kernel.org; christoffer.d...@linaro.org;
> > > > >> > marc.zyng...@arm.com; will.dea...@arm.com;
> > > > >> > bhelg...@google.com; a...@arndb.de; rob.herr...@linaro.org;
> > > > >> > eric.au...@linaro.org; patc...@apm.com; Bhushan
> > > > >> > Bharat-R65777; Yoder
> > > > >> > Stuart-B08248
> > > > >> > Subject: Re: [RFC 0/2] VFIO: Add virtual MSI doorbell support.
> > > > >> >
> > > > >> > On Fri, 2015-07-24 at 14:33 +0530, Pranavkumar Sawargaonkar
> wrote:
> > > > >> > > In current VFIO MSI/MSI-X implementation, linux host kernel
> > > > >> > > allocates MSI/MSI-X vectors when userspace requests through
> > > > >> > > vfio
> > > > ioctls.
> > > > >> > > Vfio creates irqfd mappings to notify MSI/MSI-X interrupts
> > > > >> > > to the userspace when raised.
> > > > >> > > Guest OS will see emulated MSI/MSI-X controller and
> > > > >> > > receives an interrupt when kernel notifies the same via irqfd.
> > > > >> > >
> > > > >> > > Host kernel allocates MSI/MSI-X using standard linux
> > > > >> > > routines like
> > > > >> > > pci_enable_msix_range() and pci_enable_msi_range().
> > > > >> > > These routines along with requset_irq() in host kernel sets
> > > > >> > > up MSI/MSI-X vectors with Physical MSI/MSI-X addresses
> > > > >> > > provided by interrupt controller driver in host kernel.
> > > > >> > >
> > > > >> > > This means when a device is assigned with the guest OS,
> > > > >> > > MSI/MSI-X addresses present in PCIe EP are the PAs
> > > > >> > > programmed by the host linux
> > > > >> > kernel.
> > > > >> > >
> > > > >> > > In x86 MSI/MSI-X physical address range is reserved and
> > > > >> > > iommu is aware about these addreses and transalation is
> > > > >> > > bypassed for these
> > > > address range.
> > > > >> > >
> > > > >> > > Unlike x86, ARM/ARM64 does not reserve MSI/MSI-X Physical
> > > > >> > > address range and all the transactions including MSI go
> > > > >> > > through iommu/smmu
> > > > >> > without bypass.
> > > > >> > > This requires extending current vfio MSI layer with
> > > > >> > > additional functionality for ARM/ARM64 by 1. Programing
> > > > >> > > IOVA (referred as a MSI virtual doorbell address)
> > > > >> > >in device's MSI vector as a MSI address.
> > > > >> > >This IOVA will be provided by the userspace based on the
> > > > >> > >MSI/MSI-X addresses reserved for the guest.
> > > > >> > > 2. Create an IOMMU mapping between this IOVA and
> > > > >> > >Physical address (PA) assigned to the MSI vector.
> > > > >> > >
> > > > >> > > This RFC is proposing a solution for MSI/MSI-X passthrough
> > > > >> > > for
> > > > >> > ARM/ARM64.
> > > > >> >
> > > > >> >
> > > > >> > Hi Pranavkumar,
> > > > >> >
> > > > >> > Freescale has the same, or very similar, need, so any
> > > > >> > solution in this space will need to work for both ARM and
> > > > >> > powerpc.  I'm not a big 

[RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state

2015-09-30 Thread Bharat Bhushan
This patch allows the user-space to know whether msi-pages
are automatically mapped with some magic iova or not.

Even if the msi-pages are automatically mapped, still user-space
wants to over-ride the automatic iova selection for msi-mapping.
For this user-space need to know whether it is allowed to change
the automatic mapping or not and this API provides this mechanism.
Follow up patches will provide how to over-ride this.

Signed-off-by: Bharat Bhushan 
---
 drivers/vfio/vfio_iommu_type1.c | 32 
 include/uapi/linux/vfio.h   |  3 +++
 2 files changed, 35 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index fa5d3e4..3315fb6 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -59,6 +59,7 @@ struct vfio_iommu {
struct rb_root  dma_list;
boolv2;
boolnesting;
+   boolallow_msi_reconfig;
struct list_headreserved_iova_list;
 };
 
@@ -1117,6 +1118,23 @@ static int vfio_domains_have_iommu_cache(struct 
vfio_iommu *iommu)
return ret;
 }
 
+static
+int vfio_domains_get_msi_maps(struct vfio_iommu *iommu,
+ struct iommu_domain_msi_maps *msi_maps)
+{
+   struct vfio_domain *d;
+   int ret;
+
+   mutex_lock(>lock);
+   /* All domains have same msi-automap property, pick first */
+   d = list_first_entry(>domain_list, struct vfio_domain, next);
+   ret = iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_MAPPING,
+   msi_maps);
+   mutex_unlock(>lock);
+
+   return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -1138,6 +1156,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
}
} else if (cmd == VFIO_IOMMU_GET_INFO) {
struct vfio_iommu_type1_info info;
+   struct iommu_domain_msi_maps msi_maps;
+   int ret;
 
minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
@@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
info.flags = 0;
 
+   ret = vfio_domains_get_msi_maps(iommu, _maps);
+   if (ret)
+   return ret;
+
+   if (msi_maps.override_automap) {
+   info.flags |= VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG;
+   iommu->allow_msi_reconfig = true;
+   }
+
+   if (msi_maps.automap)
+   info.flags |= VFIO_IOMMU_INFO_MSI_AUTOMAP;
+
info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
return copy_to_user((void __user *)arg, , minsz);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 1abd1a9..9998f6e 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -391,6 +391,9 @@ struct vfio_iommu_type1_info {
__u32   argsz;
__u32   flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)   /* supported page sizes info */
+#define VFIO_IOMMU_INFO_MSI_AUTOMAP (1 << 1)   /* MSI pages are auto-mapped
+  in iommu */
+#define VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG (1 << 2) /* Allows reconfig 
automap*/
__u64   iova_pgsizes;   /* Bitmap of supported page sizes */
 };
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region

2015-09-30 Thread Bharat Bhushan
This Patch adds the VFIO APIs to add and remove reserved iova
regions. The reserved iova region can be used for mapping some
specific physical address in iommu.

Currently we are planning to use this interface for adding iova
regions for creating iommu of msi-pages. But the API are designed
for future extension where some other physical address can be mapped.

Signed-off-by: Bharat Bhushan 
---
 drivers/vfio/vfio_iommu_type1.c | 201 +++-
 include/uapi/linux/vfio.h   |  43 +
 2 files changed, 243 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 57d8c37..fa5d3e4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -59,6 +59,7 @@ struct vfio_iommu {
struct rb_root  dma_list;
boolv2;
boolnesting;
+   struct list_headreserved_iova_list;
 };
 
 struct vfio_domain {
@@ -77,6 +78,15 @@ struct vfio_dma {
int prot;   /* IOMMU_READ/WRITE */
 };
 
+struct vfio_resvd_region {
+   dma_addr_t  iova;
+   size_t  size;
+   int prot;   /* IOMMU_READ/WRITE */
+   int refcount;   /* ref count of mappings */
+   uint64_tmap_paddr;  /* Mapped Physical Address */
+   struct list_head next;
+};
+
 struct vfio_group {
struct iommu_group  *iommu_group;
struct list_headnext;
@@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu 
*iommu,
return NULL;
 }
 
+/* This function must be called with iommu->lock held */
+static bool vfio_overlap_with_resvd_region(struct vfio_iommu *iommu,
+  dma_addr_t start, size_t size)
+{
+   struct vfio_resvd_region *region;
+
+   list_for_each_entry(region, >reserved_iova_list, next) {
+   if (region->iova < start)
+   return (start - region->iova < region->size);
+   else if (start < region->iova)
+   return (region->iova - start < size);
+
+   return (region->size > 0 && size > 0);
+   }
+
+   return false;
+}
+
+/* This function must be called with iommu->lock held */
+static
+struct vfio_resvd_region *vfio_find_resvd_region(struct vfio_iommu *iommu,
+dma_addr_t start, size_t size)
+{
+   struct vfio_resvd_region *region;
+
+   list_for_each_entry(region, >reserved_iova_list, next)
+   if (region->iova == start && region->size == size)
+   return region;
+
+   return NULL;
+}
+
 static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
 {
struct rb_node **link = >dma_list.rb_node, *parent = NULL;
@@ -580,7 +622,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 
mutex_lock(>lock);
 
-   if (vfio_find_dma(iommu, iova, size)) {
+   if (vfio_find_dma(iommu, iova, size) ||
+   vfio_overlap_with_resvd_region(iommu, iova, size)) {
mutex_unlock(>lock);
return -EEXIST;
}
@@ -626,6 +669,127 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
return ret;
 }
 
+/* This function must be called with iommu->lock held */
+static
+int vfio_iommu_resvd_region_del(struct vfio_iommu *iommu,
+   dma_addr_t iova, size_t size, int prot)
+{
+   struct vfio_resvd_region *res_region;
+
+   res_region = vfio_find_resvd_region(iommu, iova, size);
+   /* Region should not be mapped in iommu */
+   if (res_region == NULL || res_region->map_paddr)
+   return -EINVAL;
+
+   list_del(_region->next);
+   kfree(res_region);
+   return 0;
+}
+
+/* This function must be called with iommu->lock held */
+static int vfio_iommu_resvd_region_add(struct vfio_iommu *iommu,
+  dma_addr_t iova, size_t size, int prot)
+{
+   struct vfio_resvd_region *res_region;
+
+   /* Check overlap with with dma maping and reserved regions */
+   if (vfio_find_dma(iommu, iova, size) ||
+   vfio_find_resvd_region(iommu, iova, size))
+   return -EEXIST;
+
+   res_region = kzalloc(sizeof(*res_region), GFP_KERNEL);
+   if (res_region == NULL)
+   return -ENOMEM;
+
+   res_region->iova = iova;
+   res_region->size = size;
+   res_region->prot = prot;
+   res_region->refcount = 0;
+   res_region->map_paddr = 0;
+
+   list_add(_region->next, >reserved_iova_list);
+
+   return 0;
+}
+
+static
+int vfio_handle_reserved_region_add(struct vfio_iommu *iommu,
+   struct vfio_iommu_reserved_region_add *region)
+{
+   dma_addr_t iova = region->iova;
+   size_t size = 

[RFC PATCH 2/6] iommu: Add interface to get msi-pages mapping attributes

2015-09-30 Thread Bharat Bhushan
This APIs return the capability of automatically mapping msi-pages
in iommu with some magic iova. Which is what currently most of
iommu's does and is the default behaviour.

Further API returns whether iommu allows the user to define different
iova for mai-page mapping for the domain. This is required when a msi
capable device is directly assigned to user-space/VM and user-space/VM
need to define a non-overlapping (from other dma-able address space)
iova for msi-pages mapping in iommu.

This patch just define the interface and follow up patches will
extend this interface.

Signed-off-by: Bharat Bhushan 
---
 drivers/iommu/arm-smmu.c|  3 +++
 drivers/iommu/fsl_pamu_domain.c |  3 +++
 drivers/iommu/iommu.c   | 14 ++
 include/linux/iommu.h   |  9 -
 4 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 66a803b..a3956fb 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1406,6 +1406,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_MSI_MAPPING:
+   /* Dummy handling added */
+   return 0;
default:
return -ENODEV;
}
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 1d45293..9a94430 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -856,6 +856,9 @@ static int fsl_pamu_get_domain_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_FSL_PAMUV1:
*(int *)data = DOMAIN_ATTR_FSL_PAMUV1;
break;
+   case DOMAIN_ATTR_MSI_MAPPING:
+   /* Dummy handling added */
+   break;
default:
pr_debug("Unsupported attribute type\n");
ret = -EINVAL;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d4f527e..16c2eab 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1216,6 +1216,7 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
bool *paging;
int ret = 0;
u32 *count;
+   struct iommu_domain_msi_maps *msi_maps;
 
switch (attr) {
case DOMAIN_ATTR_GEOMETRY:
@@ -1236,6 +1237,19 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
ret = -ENODEV;
 
break;
+   case DOMAIN_ATTR_MSI_MAPPING:
+   msi_maps = data;
+
+   /* Default MSI-pages are magically mapped with some iova and
+* do now allow to configure with different iova.
+*/
+   msi_maps->automap = true;
+   msi_maps->override_automap = false;
+
+   if (domain->ops->domain_get_attr)
+   ret = domain->ops->domain_get_attr(domain, attr, data);
+
+   break;
default:
if (!domain->ops->domain_get_attr)
return -EINVAL;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0546b87..6d49f3f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -83,6 +83,13 @@ struct iommu_domain {
struct iommu_domain_geometry geometry;
 };
 
+struct iommu_domain_msi_maps {
+   dma_addr_t base_address;
+   dma_addr_t size;
+   bool automap;
+   bool override_automap;
+};
+
 enum iommu_cap {
IOMMU_CAP_CACHE_COHERENCY,  /* IOMMU can enforce cache coherent DMA
   transactions */
@@ -111,6 +118,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMU_ENABLE,
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
+   DOMAIN_ATTR_MSI_MAPPING, /* Provides MSIs mapping in iommu */
DOMAIN_ATTR_MAX,
 };
 
@@ -167,7 +175,6 @@ struct iommu_ops {
int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count);
/* Get the numer of window per domain */
u32 (*domain_get_windows)(struct iommu_domain *domain);
-
 #ifdef CONFIG_OF_IOMMU
int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
 #endif
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt

2015-09-30 Thread Bharat Bhushan
An MSI-address is allocated and programmed in pcie device
during interrupt configuration. Now for a pass-through device,
try to create the iommu mapping for this allocted/programmed
msi-address.  If the iommu mapping is created and the msi
address programmed in the pcie device is different from
msi-iova as per iommu programming then reconfigure the pci
device to use msi-iova as msi address.

Signed-off-by: Bharat Bhushan 
---
 drivers/vfio/pci/vfio_pci_intrs.c | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..c9690af 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct 
vfio_pci_device *vdev,
int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
char *name = msix ? "vfio-msix" : "vfio-msi";
struct eventfd_ctx *trigger;
+   struct msi_msg msg;
+   struct vfio_device *device;
+   uint64_t msi_addr, msi_iova;
int ret;
 
if (vector >= vdev->num_ctx)
return -EINVAL;
 
+   device = vfio_device_get_from_dev(>dev);
+   if (device == NULL)
+   return -EINVAL;
+
if (vdev->ctx[vector].trigger) {
free_irq(irq, vdev->ctx[vector].trigger);
+   get_cached_msi_msg(irq, );
+   msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo;
+   vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE);
kfree(vdev->ctx[vector].name);
eventfd_ctx_put(vdev->ctx[vector].trigger);
vdev->ctx[vector].trigger = NULL;
@@ -346,12 +356,11 @@ static int vfio_msi_set_vector_signal(struct 
vfio_pci_device *vdev,
 * cached value of the message prior to enabling.
 */
if (msix) {
-   struct msi_msg msg;
-
get_cached_msi_msg(irq, );
pci_write_msi_msg(irq, );
}
 
+
ret = request_irq(irq, vfio_msihandler, 0,
  vdev->ctx[vector].name, trigger);
if (ret) {
@@ -360,6 +369,29 @@ static int vfio_msi_set_vector_signal(struct 
vfio_pci_device *vdev,
return ret;
}
 
+   /* Re-program the new-iova in pci-device in case there is
+* different iommu-mapping created for programmed msi-address.
+*/
+   get_cached_msi_msg(irq, );
+   msi_iova = 0;
+   msi_addr = (u64)(msg.address_hi) << 32 | (u64)(msg.address_lo);
+   ret = vfio_device_map_msi(device, msi_addr, PAGE_SIZE, _iova);
+   if (ret) {
+   free_irq(irq, vdev->ctx[vector].trigger);
+   kfree(vdev->ctx[vector].name);
+   eventfd_ctx_put(trigger);
+   return ret;
+   }
+
+   /* Reprogram only if iommu-mapped iova is different from msi-address */
+   if (msi_iova && (msi_iova != msi_addr)) {
+   msg.address_hi = (u32)(msi_iova >> 32);
+   /* Keep Lower bits from original msi message address */
+   msg.address_lo &= PAGE_MASK;
+   msg.address_lo |= (u32)(msi_iova & 0x);
+   pci_write_msi_msg(irq, );
+   }
+
vdev->ctx[vector].trigger = trigger;
 
return 0;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI

2015-09-30 Thread Bharat Bhushan
Finally ARM SMMU declare that iommu-mapping for MSI-pages are not
set automatically and it should be set explicitly.

Signed-off-by: Bharat Bhushan 
---
 drivers/iommu/arm-smmu.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a3956fb..9d37e72 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1401,13 +1401,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
enum iommu_attr attr, void *data)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct iommu_domain_msi_maps *msi_maps;
 
switch (attr) {
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
return 0;
case DOMAIN_ATTR_MSI_MAPPING:
-   /* Dummy handling added */
+   msi_maps = data;
+
+   msi_maps->automap = false;
+   msi_maps->override_automap = true;
+
return 0;
default:
return -ENODEV;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: PPC: e6500: Handle LRAT error exception

2015-09-30 Thread Laurentiu Tudor
On 09/30/2015 01:32 PM, Laurentiu Tudor wrote:
> On 09/25/2015 03:10 AM, Scott Wood wrote:
>> On Thu, 2015-09-24 at 16:11 +0300, Laurentiu Tudor wrote:

[snip]

>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>> index 12d5c67..99ad88a 100644
>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>> @@ -96,6 +96,112 @@ static inline void __write_host_tlbe(struct 
>>> kvm_book3e_206_tlb_entry *stlbe,
>>> stlbe->mas2, stlbe->mas7_3);
>>>  }
>>>  
>>> +#if defined(CONFIG_64BIT) && defined(CONFIG_KVM_BOOKE_HV)
>>> +static int lrat_next(void)
>>> +{
>>
>> Will anything break by removing the CONFIG_64BIT condition, even if we don't 
>> have a 32-bit target that uses this?
> 
> Not completly certain but i remember getting compile or link errors
> on 32-bit e500mc or e500v2. I can recheck if you want.
> 
>>> +void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn)
>>> +{
>>> + struct kvm_memory_slot *slot;
>>> + unsigned long pfn;
>>> + unsigned long hva;
>>> + struct vm_area_struct *vma;
>>> + unsigned long psize;
>>> + int tsize;
>>> + unsigned long tsize_pages;
>>> +
>>> + slot = gfn_to_memslot(vcpu->kvm, gfn);
>>> + if (!slot) {
>>> + pr_err_ratelimited("%s: couldn't find memslot for gfn %lx!\n",
>>> +__func__, (long)gfn);
>>> + return;
>>> + }
>>> +
>>> + hva = slot->userspace_addr;
>>
>> What if the faulting address is somewhere in the middle of the slot?  
>> Shouldn't you use gfn_to_hva_memslot() like kvmppc_e500_shadow_map()?  In 
>> fact there's probably a lot of logic that should be shared between these two 
>> functions.
> 
> So if my understanding is correct most of the gfn -> pfn translation
> stuff done in kvmppc_e500_shadow_map() should also be present in here.
> If that's the case maybe i should first extract this code (which includes
> VM_PFNMAP handling) in a separate function and call it from both 
> kvmppc_lrat_map()
> and kvmppc_e500_shadow_map(). 
> 

Off-topic, but just noticed that kvmppc_e500_shadow_map() is marked as inline.
Was that on purpose? Is inlining such a large function worth anything?

---
Best Regards, Laurentiu
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] KVM: arm/arm64: BUG FIX: Do not inject spurious interrupts

2015-09-30 Thread Pavel Fedin
 Hello!

> I reworked the commit message and applied this patch.

 Thank you very much.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt

2015-09-30 Thread kbuild test robot
Hi Bharat,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
ignore]

config: i386-allmodconfig (attached as .config)
reproduce:
  git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by >>):

>> ERROR: "vfio_device_map_msi" undefined!
>> ERROR: "vfio_device_unmap_msi" undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 1/2] KVM: PPC: e6500: Handle LRAT error exception

2015-09-30 Thread Laurentiu Tudor
On 09/25/2015 03:10 AM, Scott Wood wrote:
> On Thu, 2015-09-24 at 16:11 +0300, Laurentiu Tudor wrote:
>> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
>> b/arch/powerpc/kvm/bookehv_interrupts.S
>> index 81bd8a07..1e9fa2a 100644
>> --- a/arch/powerpc/kvm/bookehv_interrupts.S
>> +++ b/arch/powerpc/kvm/bookehv_interrupts.S
>> @@ -62,6 +62,7 @@
>>  #define NEED_EMU 0x0001 /* emulation -- save nv regs */
>>  #define NEED_DEAR0x0002 /* save faulting DEAR */
>>  #define NEED_ESR 0x0004 /* save faulting ESR */
>> +#define NEED_LPER0x0008 /* save faulting LPER */
>>  
>>  /*
>>   * On entry:
>> @@ -159,6 +160,12 @@
>>   PPC_STL r9, VCPU_FAULT_DEAR(r4)
>>   .endif
>>  
>> + /* Only supported on 64-bit cores for now */
>> + .if \flags & NEED_LPER
>> + mfspr   r7, SPRN_LPER
>> + std r7, VCPU_FAULT_LPER(r4)
>> + .endif
> 
> What's the harm in using PPC_STL anyway?

Will do so.
 
> 
>>  /*
>>   * For input register values, see 
>> arch/powerpc/include/asm/kvm_booke_hv_asm.h
>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c 
>> b/arch/powerpc/kvm/e500_mmu_host.c
>> index 12d5c67..99ad88a 100644
>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>> @@ -96,6 +96,112 @@ static inline void __write_host_tlbe(struct 
>> kvm_book3e_206_tlb_entry *stlbe,
>> stlbe->mas2, stlbe->mas7_3);
>>  }
>>  
>> +#if defined(CONFIG_64BIT) && defined(CONFIG_KVM_BOOKE_HV)
>> +static int lrat_next(void)
>> +{
> 
> Will anything break by removing the CONFIG_64BIT condition, even if we don't 
> have a 32-bit target that uses this?

Not completly certain but i remember getting compile or link errors
on 32-bit e500mc or e500v2. I can recheck if you want.

>> +void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn)
>> +{
>> + struct kvm_memory_slot *slot;
>> + unsigned long pfn;
>> + unsigned long hva;
>> + struct vm_area_struct *vma;
>> + unsigned long psize;
>> + int tsize;
>> + unsigned long tsize_pages;
>> +
>> + slot = gfn_to_memslot(vcpu->kvm, gfn);
>> + if (!slot) {
>> + pr_err_ratelimited("%s: couldn't find memslot for gfn %lx!\n",
>> +__func__, (long)gfn);
>> + return;
>> + }
>> +
>> + hva = slot->userspace_addr;
> 
> What if the faulting address is somewhere in the middle of the slot?  
> Shouldn't you use gfn_to_hva_memslot() like kvmppc_e500_shadow_map()?  In 
> fact there's probably a lot of logic that should be shared between these two 
> functions.

So if my understanding is correct most of the gfn -> pfn translation
stuff done in kvmppc_e500_shadow_map() should also be present in here.
If that's the case maybe i should first extract this code (which includes
VM_PFNMAP handling) in a separate function and call it from both 
kvmppc_lrat_map()
and kvmppc_e500_shadow_map(). 

 
>> + down_read(>mm->mmap_sem);
>> + vma = find_vma(current->mm, hva);
>> + if (vma && (hva >= vma->vm_start)) {
>> + psize = vma_kernel_pagesize(vma);
> 
> What if it's VM_PFNMAP?
> 
>> + } else {
>> + pr_err_ratelimited("%s: couldn't find virtual memory address 
>> for gfn 
>> %lx!\n",
>> +__func__, (long)gfn);
>> + up_read(>mm->mmap_sem);
>> + return;
>> + }
>> + up_read(>mm->mmap_sem);
>> +
>> + pfn = gfn_to_pfn_memslot(slot, gfn);
>> + if (is_error_noslot_pfn(pfn)) {
>> + pr_err_ratelimited("%s: couldn't get real page for gfn %lx!\n",
>> +__func__, (long)gfn);
>> + return;
>> + }
>> +
>> + tsize = __ilog2(psize) - 10;
>> + tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT);
> 
> 1UL << ...
> 
> kvmppc_e500_shadow_map needs the same fix.

I'll make a distinct patch with the kvmppc_e500_shadow_map() fix.

>> + gfn &= ~(tsize_pages - 1);
>> + pfn &= ~(tsize_pages - 1);
>> +
>> + write_host_lrate(tsize, gfn, pfn, vcpu->kvm->arch.lpid, true);
>> +
>> + kvm_release_pfn_clean(pfn);
>> +}
>> +
>> +void kvmppc_lrat_invalidate(struct kvm_vcpu *vcpu)
>> +{
>> + uint32_t mas0, mas1 = 0;
>> + int esel;
>> + unsigned long flags;
>> +
>> + local_irq_save(flags);
>> +
>> + /* LRAT does not have a dedicated instruction for invalidation */
>> + for (esel = 0; esel < get_paca()->tcd_ptr->lrat_max; esel++) {
>> + mas0 = MAS0_ATSEL | MAS0_ESEL(esel);
>> + mtspr(SPRN_MAS0, mas0);
>> + asm volatile("isync; tlbre" : : : "memory");
>> + mas1 = mfspr(SPRN_MAS1) & ~MAS1_VALID;
>> + mtspr(SPRN_MAS1, mas1);
>> + asm volatile("isync; tlbwe" : : : "memory");
>> + }
>> + /* Must clear mas8 for other host tlbwe's */
>> + mtspr(SPRN_MAS8, 0);
>> + isync();
>> +
>> + local_irq_restore(flags);
>> +}
>> +#endif /* CONFIG_64BIT 

Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt

2015-09-30 Thread kbuild test robot
Hi Bharat,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
ignore]

config: x86_64-rhel (attached as .config)
reproduce:
  git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9
  # save the attached .config to linux build tree
  make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

>> ERROR: "vfio_device_map_msi" [drivers/vfio/pci/vfio-pci.ko] undefined!
>> ERROR: "vfio_device_unmap_msi" [drivers/vfio/pci/vfio-pci.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[RFC PATCH v5 2/3] vfio: platform: access device property as a list of strings

2015-09-30 Thread Baptiste Reynal
From: Antonios Motakis 

Certain device properties (e.g. the device node name, the compatible
string), are available as a list of strings (separated by the null
terminating character). Let the VFIO user query this type of properties.

Signed-off-by: Antonios Motakis 
Signed-off-by: Baptiste Reynal 

---
v4 -> v5:
 - return ENOSPC when the buffer size is too small
 - remove strlen call

v3 -> v4:
 - The list length is computed before strings copy. If the entire list
   doesn't fit, no strings are copied to the user.
---
 drivers/vfio/platform/properties.c | 43 +-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/properties.c 
b/drivers/vfio/platform/properties.c
index 48c90c5..212755f 100644
--- a/drivers/vfio/platform/properties.c
+++ b/drivers/vfio/platform/properties.c
@@ -22,7 +22,48 @@ static int dev_property_get_strings(struct device *dev,
char *name, unsigned *lenp,
void __user *datap, unsigned long datasz)
 {
-   return -EINVAL;
+   const char **val;
+   int n, i, ret;
+
+   if (lenp == NULL)
+   return -EFAULT;
+
+   *lenp = 0;
+
+   n = device_property_read_string_array(dev, name, NULL, 0);
+   if (n < 0)
+   return n;
+
+   val = kcalloc(n, sizeof(char *), GFP_KERNEL);
+   if (!val)
+   return -ENOMEM;
+
+   ret = device_property_read_string_array(dev, name, val, n);
+   if (ret < 0)
+   goto out;
+
+   for (i = 0; i < n; i++)
+   *lenp += strlen(val[i]) + 1;
+
+   if (datasz < *lenp) {
+   ret = -ENOSPC;
+   goto out;
+   }
+
+   for (i = 0; i < n; i++) {
+   size_t len = strlen(val[i]) + 1;
+
+   if (copy_to_user(datap, val[i], len)) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   datap += len;
+   }
+
+out:
+   kfree(val);
+   return ret;
 }
 
 static int dev_property_get_uint(struct device *dev,
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v5 3/3] vfio: platform: return device properties as arrays of unsigned integers

2015-09-30 Thread Baptiste Reynal
From: Antonios Motakis 

Certain properties of a device are accessible as an array of unsigned
integers, either u64, u32, u16, or u8. Let the VFIO user query this
type of device properties.

Signed-off-by: Antonios Motakis 
Signed-off-by: Baptiste Reynal 

---
v4 -> v5:
 - fix return error when the buffer size is too small
---
 drivers/vfio/platform/properties.c | 62 +-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/properties.c 
b/drivers/vfio/platform/properties.c
index 212755f..a4b6955 100644
--- a/drivers/vfio/platform/properties.c
+++ b/drivers/vfio/platform/properties.c
@@ -70,7 +70,67 @@ static int dev_property_get_uint(struct device *dev,
 char *name, uint32_t type, unsigned *lenp,
 void __user *datap, unsigned long datasz)
 {
-   return -EINVAL;
+   int ret, n;
+   u8 *out;
+   size_t sz;
+   int (*func)(const struct device *, const char *, void *, size_t)
+   = NULL;
+
+   switch (type) {
+   case VFIO_DEV_PROPERTY_TYPE_U64:
+   sz = sizeof(u64);
+   func = (int (*)(const struct device *,
+   const char *, void *, size_t))
+   device_property_read_u64_array;
+   break;
+   case VFIO_DEV_PROPERTY_TYPE_U32:
+   sz = sizeof(u32);
+   func = (int (*)(const struct device *,
+   const char *, void *, size_t))
+   device_property_read_u32_array;
+   break;
+   case VFIO_DEV_PROPERTY_TYPE_U16:
+   sz = sizeof(u16);
+   func = (int (*)(const struct device *,
+   const char *, void *, size_t))
+   device_property_read_u16_array;
+   break;
+   case VFIO_DEV_PROPERTY_TYPE_U8:
+   sz = sizeof(u8);
+   func = (int (*)(const struct device *,
+   const char *, void *, size_t))
+   device_property_read_u8_array;
+   break;
+
+   default:
+   return -EINVAL;
+   }
+
+   /* get size of array */
+   n = func(dev, name, NULL, 0);
+   if (n < 0)
+   return n;
+
+   if (lenp)
+   *lenp = n * sz;
+
+   if (n * sz > datasz)
+   return -ENOSPC;
+
+   out = kcalloc(n, sz, GFP_KERNEL);
+   if (!out)
+   return -ENOMEM;
+
+   ret = func(dev, name, out, n);
+   if (ret)
+   goto out;
+
+   if (copy_to_user(datap, out, n * sz))
+   ret = -EFAULT;
+
+out:
+   kfree(out);
+   return ret;
 }
 
 int vfio_platform_dev_properties(struct device *dev,
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt

2015-09-30 Thread Bhushan Bharat

> -Original Message-
> From: kbuild test robot [mailto:l...@intel.com]
> Sent: Wednesday, September 30, 2015 4:33 PM
> To: Bhushan Bharat-R65777 
> Cc: kbuild-...@01.org; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> alex.william...@redhat.com; christoffer.d...@linaro.org;
> eric.au...@linaro.org; pranavku...@linaro.org; marc.zyng...@arm.com;
> will.dea...@arm.com; Bhushan Bharat-R65777
> 
> Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> interrupt
> 
> Hi Bharat,
> 
> [auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
> ignore]
> 
> config: x86_64-rhel (attached as .config)
> reproduce:
>   git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9
>   # save the attached .config to linux build tree
>   make ARCH=x86_64
> 
> All error/warnings (new ones prefixed by >>):
> 
> >> ERROR: "vfio_device_map_msi" [drivers/vfio/pci/vfio-pci.ko] undefined!
> >> ERROR: "vfio_device_unmap_msi" [drivers/vfio/pci/vfio-pci.ko]
> undefined!

Yes, this is the problem, I will correct this in next version.

Thanks
-Bharat

> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-30 Thread Peter Zijlstra
On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote:
> On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar  wrote:
> >
> > Linus, what's your preference?
> 
> So quite frankly, is there any reason we don't just implement
> native_read_msr() as just
> 
>unsigned long long native_read_msr(unsigned int msr)
>{
>   int err;
>   unsigned long long val;
> 
>   val = native_read_msr_safe(msr, );
>   WARN_ON_ONCE(err);
>   return val;
>}
> 
> Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> done with it. I don't see the downside.
> 
> How many msr reads are so critical that the function call
> overhead would matter? Get rid of the inline version of the _safe()
> thing too, and put that thing there too.

There are a few in the perf code, and esp. on cores without a stack
engine the call overhead is noticeable. Also note that the perf MSRs are
generally optimized MSRs and less slow (we cannot say fast, they're
still MSRs) than regular MSRs.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-30 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote:
> > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar  wrote:
> > >
> > > Linus, what's your preference?
> > 
> > So quite frankly, is there any reason we don't just implement
> > native_read_msr() as just
> > 
> >unsigned long long native_read_msr(unsigned int msr)
> >{
> >   int err;
> >   unsigned long long val;
> > 
> >   val = native_read_msr_safe(msr, );
> >   WARN_ON_ONCE(err);
> >   return val;
> >}
> > 
> > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> > done with it. I don't see the downside.
> > 
> > How many msr reads are so critical that the function call
> > overhead would matter? Get rid of the inline version of the _safe()
> > thing too, and put that thing there too.
> 
> There are a few in the perf code, and esp. on cores without a stack engine 
> the 
> call overhead is noticeable. Also note that the perf MSRs are generally 
> optimized MSRs and less slow (we cannot say fast, they're still MSRs) than 
> regular MSRs.

These could still be open coded in an inlined fashion, like the scheduler usage.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 07/11] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest

2015-09-30 Thread Andre Przywara
Hi Christoffer,

On 29/09/15 14:44, Christoffer Dall wrote:
> On Wed, Sep 23, 2015 at 06:55:04PM +0100, Andre Przywara wrote:
>> Salut Marc,
>>
>> I know that this patch is already merged, but 
>>
>> On 07/08/15 16:45, Marc Zyngier wrote:
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index 51c9900..9d009d2 100644
>> ...
>>> @@ -1364,6 +1397,39 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
>>> *vcpu)
>>> return level_pending;
>>>  }
>>>  
>>> +/*
>>> + * Save the physical active state, and reset it to inactive.
>>> + *
>>> + * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
>>> + */
>>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>> +{
>>> +   struct irq_phys_map *map;
>>> +   int ret;
>>> +
>>> +   if (!(vlr.state & LR_HW))
>>> +   return 0;
>>> +
>>> +   map = vgic_irq_map_search(vcpu, vlr.irq);
>>> +   BUG_ON(!map || !map->active);
>>> +
>>> +   ret = irq_get_irqchip_state(map->irq,
>>> +   IRQCHIP_STATE_ACTIVE,
>>> +   >active);
>>> +
>>> +   WARN_ON(ret);
>>> +
>>> +   if (map->active) {
>>> +   ret = irq_set_irqchip_state(map->irq,
>>> +   IRQCHIP_STATE_ACTIVE,
>>> +   false);
>>> +   WARN_ON(ret);
>>> +   return 0;
>>> +   }
>>> +
>>> +   return 1;
>>> +}
>>> +
>>>  /* Sync back the VGIC state after a guest run */
>>>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>>>  {
>>> @@ -1378,14 +1444,31 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu 
>>> *vcpu)
>>> elrsr = vgic_get_elrsr(vcpu);
>>> elrsr_ptr = u64_to_bitmask();
>>>  
>>> -   /* Clear mappings for empty LRs */
>>> -   for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
>>> +   /* Deal with HW interrupts, and clear mappings for empty LRs */
>>> +   for (lr = 0; lr < vgic->nr_lr; lr++) {
>>> struct vgic_lr vlr;
>>>  
>>> -   if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
>>> +   if (!test_bit(lr, vgic_cpu->lr_used))
>>> continue;
>>>  
>>> vlr = vgic_get_lr(vcpu, lr);
>>> +   if (vgic_sync_hwirq(vcpu, vlr)) {
>>> +   /*
>>> +* So this is a HW interrupt that the guest
>>> +* EOI-ed. Clean the LR state and allow the
>>> +* interrupt to be sampled again.
>>> +*/
>>> +   vlr.state = 0;
>>> +   vlr.hwirq = 0;
>>> +   vgic_set_lr(vcpu, lr, vlr);
>>> +   vgic_irq_clear_queued(vcpu, vlr.irq);
>>
>> Isn't this line altering common VGIC state without holding the lock?
>> Eric removed the coarse dist->lock around the whole
>> __kvm_vgic_sync_hwstate() function, we take it now in
>> vgic_process_maintenance(), but don't hold it here AFAICT.
>> As long as we are only dealing with private timer IRQs this is probably
>> not a problem, but the IRQ number could be a SPI as well, right?
>>
> I don't see a problematic race with this though, as all we're doing is
> to clear a bit in a bitmap, which is always checked atomically, so
> adding a lock around this really doesn't change anything as far as I can
> tell.

Indeed I found a similar comment in some older revisions of the code.

But isn't it that other code holding the lock (thinking about
kvm_vgic_flush_hwstate() in particular) assumes that no-one else tinkers
with the VGIC state while it holds the lock?
So couldn't we (potentially) run into inconsistent state because we
cleared the queued bit while the flushing code runs over all interrupts?
Maybe not in this particular case, but in general?

Haven't looked at your new series yet, but will do this ASAP.

Cheers,
Andre.

> 
> Nevertheless, my rework series also addresses this.
> 
> -Christoffer
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvmtool] Skip a few messages by default: command line args; flat binary; earlyprintk.

2015-09-30 Thread Andre Przywara
Hi Dimitri,

thanks for sharing your patches.

On 29/09/15 17:59, Dimitri John Ledkov wrote:
> The partial command line args & earlyprintk=serial are still enabled
> in the debug mode. Warning that a flat binary kernel image is attemped
> to be loaded is completely dropped. These are not that informative,
> once one is past intial debugging, and only polute the console.
> 
> Signed-off-by: Dimitri John Ledkov 
> ---
>  builtin-run.c | 10 ++
>  kvm.c |  1 -
>  x86/kvm.c |  8 ++--
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/builtin-run.c b/builtin-run.c
> index e0c8732..8edbf88 100644
> --- a/builtin-run.c
> +++ b/builtin-run.c
> @@ -613,10 +613,12 @@ static struct kvm *kvm_cmd_run_init(int argc, const 
> char **argv)
>  
>   kvm->cfg.real_cmdline = real_cmdline;
>  
> - printf("  # %s run -k %s -m %Lu -c %d --name %s\n", KVM_BINARY_NAME,
> - kvm->cfg.kernel_filename,
> - (unsigned long long)kvm->cfg.ram_size / 1024 / 1024,
> - kvm->cfg.nrcpus, kvm->cfg.guest_name);
> + if (do_debug_print) {
> + printf("  # %s run -k %s -m %Lu -c %d --name %s\n", 
> KVM_BINARY_NAME,
> +kvm->cfg.kernel_filename,
> +(unsigned long long)kvm->cfg.ram_size / 1024 / 1024,
> +kvm->cfg.nrcpus, kvm->cfg.guest_name);
> + }

I like the general idea. In fact I have this very patch (among others)
in my tree too. I applied similar guarding to other messages as well
(mostly those that only show up on ARM, but also the "ended normally"
message). Like any good UNIX tool kvmtool should keep quiet if it has
nothing worthwhile to say ;-)
But looking at it more closely, I see that there is pr_debug() defined
doing that "if (do_debug_print)" already. The only issue is that is
prints source line information, which is not really useful here. But
then again there does not seem to be any user of it?

So what about the following:
- We avoid printing pr_info() messages in the default case. Looking at
its current users in the tree this information is not really useful for
normal users. We enable pr_info() output only if do_debug_print is
enabled or introduce another command line flag (--verbose?) for that.
- We check each user of pr_info() to see whether this information is
actually "informational" or whether it should be converted to pr_warn.
- We change the above line to use pr_info instead of printf.
- We fix the EOL mayhem we have atm while at it.

If you don't mind I will give this a try later this week.

>  
>   if (init_list__init(kvm) < 0)
>   die ("Initialisation failed");
> diff --git a/kvm.c b/kvm.c
> index 10ed230..1081072 100644
> --- a/kvm.c
> +++ b/kvm.c
> @@ -378,7 +378,6 @@ bool kvm__load_kernel(struct kvm *kvm, const char 
> *kernel_filename,
>   if (ret)
>   goto found_kernel;
>  
> - pr_warning("%s is not a bzImage. Trying to load it as a flat 
> binary...", kernel_filename);

I think on x86 this message is useful to have: to point people to the
fact that they are trying to load a kernel which most probably isn't one.
Do you actually load a "flat binary", so not a Linux bzImage? If yes,
what is it? Does this work for you? I didn't have the impression that
this code was actually used at all.
If you do use it, could you please give my kernel loading series [1] a
try? I touch flat binary loading there, but had no chance to test it.

>  #endif
>  
>   ret = load_elf_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
> diff --git a/x86/kvm.c b/x86/kvm.c
> index 512ad67..4a5fa41 100644
> --- a/x86/kvm.c
> +++ b/x86/kvm.c
> @@ -124,8 +124,12 @@ void kvm__arch_set_cmdline(char *cmdline, bool video)
>   "i8042.dumbkbd=1 i8042.nopnp=1");
>   if (video)
>   strcat(cmdline, " video=vesafb console=tty0");
> - else
> - strcat(cmdline, " console=ttyS0 earlyprintk=serial 
> i8042.noaux=1");
> + else {
> + strcat(cmdline, " console=ttyS0 i8042.noaux=1");
> + if (do_debug_print) {
> + strcat(cmdline, " earlyprintk=serial");
> + }
> + }

I am not completely convinced of this one. The do_debug_print is meant
to affect kvmtool's own debug output only and should really have no
effect on the guest's kernel output, shouldn't it?
Maybe we should clarify the semantics in the documentation?

Cheers,
Andre.

[1] http://marc.info/?l=kvm=143825354808135

>  }
>  
>  /* Architecture-specific KVM init */
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvmtool] kvmtool: expose the TSC Deadline Timer feature to the guest

2015-09-30 Thread Andre Przywara
Hi Dimitri,

On 29/09/15 18:00, Dimitri John Ledkov wrote:
> From: Arjan van de Ven 
> 
> with the TSC deadline timer feature, we don't need to calibrate the apic
> timers anymore, which saves more than 100 milliseconds of boot time.
> 
> Signed-off-by: Arjan van de Ven 
> Signed-off-by: Dimitri John Ledkov 
> ---
>  x86/cpuid.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/x86/cpuid.c b/x86/cpuid.c
> index c3b67d9..1d8bd23 100644
> --- a/x86/cpuid.c
> +++ b/x86/cpuid.c
> @@ -31,6 +31,9 @@ static void filter_cpuid(struct kvm_cpuid2 *kvm_cpuid)
>   /* Set X86_FEATURE_HYPERVISOR */
>   if (entry->index == 0)
>   entry->ecx |= (1 << 31);
> +/* Set CPUID_EXT_TSC_DEADLINE_TIMER*/
> + if (entry->index == 0)
> + entry->ecx |= (1 << 24);

This can only be enabled if the kernel supports emulation of that
feature (reported via KVM_CAP_TSC_DEADLINE_TIMER)
(cf: Documentation/virtual/kvm/api.txt and respective QEMU code in
target-i386/kvm.c)

Cheers,
Andre.

>   break;
>   case 6:
>   /* Clear X86_FEATURE_EPB */
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: PPC: e6500: Handle LRAT error exception

2015-09-30 Thread Scott Wood
On Wed, 2015-09-30 at 14:27 +0300, Laurentiu Tudor wrote:
> On 09/30/2015 01:32 PM, Laurentiu Tudor wrote:
> > On 09/25/2015 03:10 AM, Scott Wood wrote:
> > > On Thu, 2015-09-24 at 16:11 +0300, Laurentiu Tudor wrote:
> 
> [snip]
> 
> > > > b/arch/powerpc/kvm/e500_mmu_host.c
> > > > index 12d5c67..99ad88a 100644
> > > > --- a/arch/powerpc/kvm/e500_mmu_host.c
> > > > +++ b/arch/powerpc/kvm/e500_mmu_host.c
> > > > @@ -96,6 +96,112 @@ static inline void __write_host_tlbe(struct 
> > > > kvm_book3e_206_tlb_entry *stlbe,
> > > > stlbe->mas2, stlbe->mas7_3);
> > > >  }
> > > >  
> > > > +#if defined(CONFIG_64BIT) && defined(CONFIG_KVM_BOOKE_HV)
> > > > +static int lrat_next(void)
> > > > +{
> > > 
> > > Will anything break by removing the CONFIG_64BIT condition, even if we 
> > > don't 
> > > have a 32-bit target that uses this?
> > 
> > Not completly certain but i remember getting compile or link errors
> > on 32-bit e500mc or e500v2. I can recheck if you want.
> > 
> > > > +void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn)
> > > > +{
> > > > + struct kvm_memory_slot *slot;
> > > > + unsigned long pfn;
> > > > + unsigned long hva;
> > > > + struct vm_area_struct *vma;
> > > > + unsigned long psize;
> > > > + int tsize;
> > > > + unsigned long tsize_pages;
> > > > +
> > > > + slot = gfn_to_memslot(vcpu->kvm, gfn);
> > > > + if (!slot) {
> > > > + pr_err_ratelimited("%s: couldn't find memslot for gfn 
> > > > %lx!\n",
> > > > +__func__, (long)gfn);
> > > > + return;
> > > > + }
> > > > +
> > > > + hva = slot->userspace_addr;
> > > 
> > > What if the faulting address is somewhere in the middle of the slot?  
> > > Shouldn't you use gfn_to_hva_memslot() like kvmppc_e500_shadow_map()?  
> > > In 
> > > fact there's probably a lot of logic that should be shared between 
> > > these two 
> > > functions.
> > 
> > So if my understanding is correct most of the gfn -> pfn translation
> > stuff done in kvmppc_e500_shadow_map() should also be present in here.
> > If that's the case maybe i should first extract this code (which includes
> > VM_PFNMAP handling) in a separate function and call it from both 
> > kvmppc_lrat_map()
> > and kvmppc_e500_shadow_map(). 
> > 
> 
> Off-topic, but just noticed that kvmppc_e500_shadow_map() is marked as 
> inline.
> Was that on purpose? Is inlining such a large function worth anything?

I don't remember the intent.  It was probably a lot smaller back then.  That 
said, it's only used two places, with probably pretty good temporal 
separation between performance-intensive uses of one versus the other (so not 
a huge icache concern), and a pretty good portion of the function will be 
optimized out in the caller with tlbsel == 0.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html