Re: [PATCH] arm64: KVM: Do not inject a 64bit fault for a 32bit guest

2015-08-27 Thread Russell King - ARM Linux
On Thu, Aug 27, 2015 at 03:05:47PM +0100, Marc Zyngier wrote:
 When injecting a fault into a 32bit guest, it seems rather idiotic
 to also inject a 64bit fault that is only going to corrupt the
 guest state, and lead to a situation where we restore an illegal
 context.
 
 Just fix the stupid bug that has been there from day 1.
 
 Cc: sta...@vger.kernel.org
 Reported-by: Russell King li...@arm.linux.org.uk

s/linux/rmk+kernel/ please

Tested-by: Russell King rmk+ker...@arm.linux.org.uk

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
 Will: Paolo being on holiday, do you mind merging this one
 via your tree?

I don't think the commit message does this bug justice.  The implication
is it's just a guest issue.  It isn't, the bug appears to take out the
host kernel in a truely spectacular way.

http://www.arm.linux.org.uk/developer/build/result.php?type=bootidx=4871

Tested here, the fix stops the host kernel exploding.  The crashed kvm
instance can be stopped and a proper kernel can then be booted in a new
guest instance.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4] irqchip: GICv2/v3: Add support for irq_vcpu_affinity

2015-08-27 Thread Eric Auger
Hi Marc,

I tested the series on Calxeda Midway with VFIO use case. Also reviewed
it again without finding anything new.

Tested-by: Eric Auger eric.au...@linaro.org
Reviewed-by: Eric Auger eric.au...@linaro.org

Best Regards

Eric


On 08/26/2015 06:00 PM, Marc Zyngier wrote:
 The GICv2 and GICv3 architectures allow an active physical interrupt
 to be forwarded to a guest, and the guest to indirectly perform the
 deactivation of the interrupt by performing an EOI on the virtual
 interrupt (see for example the GICv2 spec, 3.2.1).
 
 This allows some substantial performance improvement for level
 triggered interrupts that otherwise have to be masked/unmasked in
 VFIO, not to mention the required trap back to KVM when the guest
 performs an EOI.
 
 To enable this, the GICs need to be switched to a different EOImode,
 where a taken interrupt can be left active (which prevents the same
 interrupt from being taken again), while other interrupts are still
 being processed normally.
 
 We also use the new irq_set_vcpu_affinity hook that was introduced for
 Intel's Posted Interrupts to determine whether or not to perform the
 deactivation at EOI-time.
 
 As all of this only makes sense when the kernel can behave as a
 hypervisor, we only enable this mode on detecting that the kernel was
 actually booted in HYP mode, and that the GIC supports this feature.
 
 This series is a complete rework of a RFC I sent over a year ago:
 
 http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/266328.html
 
 Since then, a lot has been either merged (the irqchip_state) or reworked
 (my active-timer series: http://www.spinics.net/lists/kvm/msg118768.html),
 and this implements the last few bits for Eric Auger's series to
 finally make it into the kernel:
 
 https://lkml.org/lkml/2015/7/2/268
 https://lkml.org/lkml/2015/7/6/291
 
 With all these patches combined, physical interrupt routing from the
 kernel into a VM becomes possible.
 
 Note that the implementation makes use of the static_key mechanism,
 which is undergoing an extensive rework in 4.3. I intend to convert
 this code once both are in mainline.
 
 This has been tested on Juno (GICv2) and FastModel (GICv3). A branch
 is available at:
 
 git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git 
 irq/gic-irq-vcpu-affinity-v4
 
 * From v3:
   - Use separate irq_chip structures, leading to much nicer code (tglx)
   - Dropped Eric's Tested/Reviewed-by as there is significant changes
 
 * From v2:
   - Another small fix from Eric
   - Some commit message cleanups
 
 * From v1:
   - Fixes after review from Eric
   - Got rid of the cascaded GICv2 hack (it was broken anyway)
   - Folded the LPI deactivation patch (it makes more sense as part of
 the main one.
   - Some clarifying comments about the deactivate on mask
   - I haven't retained Eric's Reviewed/Tested-by, as the code as
 significantly changed on GICv2
 
 Marc Zyngier (4):
   irqchip: GICv3: Convert to EOImode == 1
   irqchip: GICv3: Don't deactivate interrupts forwarded to a guest
   irqchip: GIC: Convert to EOImode == 1
   irqchip: GIC: Don't deactivate interrupts forwarded to a guest
 
  drivers/irqchip/irq-gic-v3.c   |  70 +--
  drivers/irqchip/irq-gic.c  | 111 
 -
  include/linux/irqchip/arm-gic-v3.h |   9 +++
  include/linux/irqchip/arm-gic.h|   4 ++
  4 files changed, 188 insertions(+), 6 deletions(-)
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iommu/s390: add iommu api for s390 pci devices

2015-08-27 Thread Gerald Schaefer
This adds an IOMMU API implementation for s390 PCI devices.

Reviewed-by: Sebastian Ott seb...@linux.vnet.ibm.com
Signed-off-by: Gerald Schaefer gerald.schae...@de.ibm.com
---
 MAINTAINERS |   7 +
 arch/s390/Kconfig   |   1 +
 arch/s390/include/asm/pci.h |   4 +
 arch/s390/include/asm/pci_dma.h |   5 +-
 arch/s390/pci/pci_dma.c |  37 +++--
 drivers/iommu/Kconfig   |   7 +
 drivers/iommu/Makefile  |   1 +
 drivers/iommu/s390-iommu.c  | 337 
 8 files changed, 386 insertions(+), 13 deletions(-)
 create mode 100644 drivers/iommu/s390-iommu.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 23da5da..8845410 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8937,6 +8937,13 @@ F:   drivers/s390/net/*iucv*
 F: include/net/iucv/
 F: net/iucv/
 
+S390 IOMMU (PCI)
+M: Gerald Schaefer gerald.schae...@de.ibm.com
+L: linux-s...@vger.kernel.org
+W: http://www.ibm.com/developerworks/linux/linux390/
+S: Supported
+F: drivers/iommu/s390-iommu.c
+
 S3C24XX SD/MMC Driver
 M: Ben Dooks ben-li...@fluff.org
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 1d57000..74db332 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -582,6 +582,7 @@ menuconfig PCI
bool PCI support
select HAVE_DMA_ATTRS
select PCI_MSI
+   select IOMMU_SUPPORT
help
  Enable PCI support.
 
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 34d9603..c873e68 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -62,6 +62,8 @@ struct zpci_bar_struct {
u8  size;   /* order 2 exponent */
 };
 
+struct s390_domain;
+
 /* Private data per function */
 struct zpci_dev {
struct pci_dev  *pdev;
@@ -118,6 +120,8 @@ struct zpci_dev {
 
struct dentry   *debugfs_dev;
struct dentry   *debugfs_perf;
+
+   struct s390_domain *s390_domain; /* s390 IOMMU domain data */
 };
 
 static inline bool zdev_enabled(struct zpci_dev *zdev)
diff --git a/arch/s390/include/asm/pci_dma.h b/arch/s390/include/asm/pci_dma.h
index 30b4c17..7a7abf1 100644
--- a/arch/s390/include/asm/pci_dma.h
+++ b/arch/s390/include/asm/pci_dma.h
@@ -192,5 +192,8 @@ static inline unsigned long *get_st_pto(unsigned long entry)
 /* Prototypes */
 int zpci_dma_init_device(struct zpci_dev *);
 void zpci_dma_exit_device(struct zpci_dev *);
-
+void dma_free_seg_table(unsigned long);
+unsigned long *dma_alloc_cpu_table(void);
+void dma_cleanup_tables(unsigned long *);
+void dma_update_cpu_trans(unsigned long *, void *, dma_addr_t, int);
 #endif
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 37505b8..37d10f7 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -24,7 +24,7 @@ static int zpci_refresh_global(struct zpci_dev *zdev)
  zdev-iommu_pages * PAGE_SIZE);
 }
 
-static unsigned long *dma_alloc_cpu_table(void)
+unsigned long *dma_alloc_cpu_table(void)
 {
unsigned long *table, *entry;
 
@@ -114,12 +114,12 @@ static unsigned long *dma_walk_cpu_trans(unsigned long 
*rto, dma_addr_t dma_addr
return pto[px];
 }
 
-static void dma_update_cpu_trans(struct zpci_dev *zdev, void *page_addr,
-dma_addr_t dma_addr, int flags)
+void dma_update_cpu_trans(unsigned long *dma_table, void *page_addr,
+ dma_addr_t dma_addr, int flags)
 {
unsigned long *entry;
 
-   entry = dma_walk_cpu_trans(zdev-dma_table, dma_addr);
+   entry = dma_walk_cpu_trans(dma_table, dma_addr);
if (!entry) {
WARN_ON_ONCE(1);
return;
@@ -156,7 +156,8 @@ static int dma_update_trans(struct zpci_dev *zdev, unsigned 
long pa,
goto no_refresh;
 
for (i = 0; i  nr_pages; i++) {
-   dma_update_cpu_trans(zdev, page_addr, dma_addr, flags);
+   dma_update_cpu_trans(zdev-dma_table, page_addr, dma_addr,
+flags);
page_addr += PAGE_SIZE;
dma_addr += PAGE_SIZE;
}
@@ -181,7 +182,7 @@ no_refresh:
return rc;
 }
 
-static void dma_free_seg_table(unsigned long entry)
+void dma_free_seg_table(unsigned long entry)
 {
unsigned long *sto = get_rt_sto(entry);
int sx;
@@ -193,21 +194,18 @@ static void dma_free_seg_table(unsigned long entry)
dma_free_cpu_table(sto);
 }
 
-static void dma_cleanup_tables(struct zpci_dev *zdev)
+void dma_cleanup_tables(unsigned long *table)
 {
-   unsigned long *table;
int rtx;
 
-   if (!zdev || !zdev-dma_table)
+   if (!table)
return;
 
-   table = zdev-dma_table;
for (rtx = 0; rtx  ZPCI_TABLE_ENTRIES; rtx++)
if (reg_entry_isvalid(table[rtx]))

Re: [PATCH v4 0/4] irqchip: GICv2/v3: Add support for irq_vcpu_affinity

2015-08-27 Thread Marc Zyngier
Hi Eric,

On 27/08/15 14:03, Eric Auger wrote:
 Hi Marc,
 
 I tested the series on Calxeda Midway with VFIO use case. Also reviewed
 it again without finding anything new.
 
 Tested-by: Eric Auger eric.au...@linaro.org
 Reviewed-by: Eric Auger eric.au...@linaro.org

Thanks a lot Eric, much appreciated!

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: BUG FIX: Do not inject spurious interrupts

2015-08-27 Thread Christoffer Dall
On Thu, Aug 27, 2015 at 02:51:22PM +0300, Pavel Fedin wrote:
 Commit 71760950bf3dc796e5e53ea3300dec724a09f593
 (arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn) introduced
 vgic_queue_irq_to_lr() function with additional vgic_dist_irq_is_pending()
 check before setting LR_STATE_PENDING bit. In some cases it started
 causing the following situation if the userland quickly drops the IRQ back
 to inactive state for some reason:
 1. Userland injects an IRQ with level == 1, this ends up in
vgic_update_irq_pending(), which in turn calls vgic_dist_irq_set_pending()
for this IRQ.
 2. vCPU gets kicked. But kernel does not manage to reschedule it quickly
(!!!)
 3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending()
in this case will call vgic_dist_irq_clear_pending() and reset the
pending flag.
 4. vCPU finally wakes up. It succesfully rolls through through
__kvm_vgic_flush_hwstate(), which populates vGIC registers. However,
since neither pending nor active flags are now set for this IRQ,
vgic_queue_irq_to_lr() does not set any state bits on this LR at all.
Since this is level-sensitive IRQ, we end up in LR containing only
LR_EOI_INT bit, causing unnecessary immediate exit from the guest.
 
 This patch fixes the problem by adding forgotten vgic_cpu_irq_clear().
 This causes the IRQ not to be included into any lists, if it has been
 picked up after getting dropped to inactive level. Since this is a
 level-sensitive IRQ, this is correct behavior.
 
 The bug was caught on ARM64 kernel v4.1.6, running qemu virt guest,
 where it was caused by emulated pl011.

It's a bit weird to just sned this as a new patch without replying to my
mail from yesterday with feedback, explaining changes from what I did
etc.  Anyway.

 
 Signed-off-by: Pavel Fedin p.fe...@samsung.com
 ---
  virt/kvm/arm/vgic.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 34dad3c..bf155e3 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -,7 +,8 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, 
 int irq,
   kvm_debug(Set active, clear distributor: 0x%x\n, vlr.state);
   vgic_irq_clear_active(vcpu, irq);
   vgic_update_state(vcpu-kvm);
 - } else if (vgic_dist_irq_is_pending(vcpu, irq)) {
 + } else {
 + WARN_ON(!vgic_dist_irq_is_pending(vcpu, irq));
   vlr.state |= LR_STATE_PENDING;
   kvm_debug(Set pending: 0x%x\n, vlr.state);
   }
 @@ -1567,8 +1568,10 @@ static int vgic_update_irq_pending(struct kvm *kvm, 
 int cpuid,
   } else {
   if (level_triggered) {
   vgic_dist_irq_clear_level(vcpu, irq_num);
 - if (!vgic_dist_irq_soft_pend(vcpu, irq_num))
 + if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) {
   vgic_dist_irq_clear_pending(vcpu, irq_num);
 + vgic_cpu_irq_clear(vcpu, irq_num);

I think you're missing a potential change to the irq_pending_on_cpu
field here, which you have to compute by calling vgic_update_state()
like we do elsewhere when we change status bits (note that this is
different from the incorrect approach I suggested yesterday where we
always just clear the bit for that vcpu).

 + }
   }
  
   ret = false;
 -- 
 2.4.4
 

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/15] KVM: arm64: add data structures to model ITS interrupt translation

2015-08-27 Thread Eric Auger
Hi Andre,
On 08/25/2015 01:15 PM, Andre Przywara wrote:
 Hi Eric,
 
 On 13/08/15 16:46, Eric Auger wrote:

 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 The GICv3 Interrupt Translation Service (ITS) uses tables in memory
 to allow a sophisticated interrupt routing. It features device tables,
 an interrupt table per device and a table connecting collections to
 actual CPUs (aka. redistributors in the GICv3 lingo).
 Since the interrupt numbers for the LPIs are allocated quite sparsely
 and the range can be quite huge (8192 LPIs being the minimum), using
 bitmaps or arrays for storing information is a waste of memory.
 We use linked lists instead, which we iterate linearily. This works
 very well with the actual number of LPIs/MSIs in the guest being
 quite low. Should the number of LPIs exceed the number where iterating
 through lists seems acceptable, we can later revisit this and use more
 efficient data structures.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h  |  3 +++
  virt/kvm/arm/its-emul.c | 48 
 
  2 files changed, 51 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index b432055..1648668 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -25,6 +25,7 @@
  #include linux/spinlock.h
  #include linux/types.h
  #include kvm/iodev.h
 +#include linux/list.h
  
  #define VGIC_NR_IRQS_LEGACY256
  #define VGIC_NR_SGIS   16
 @@ -162,6 +163,8 @@ struct vgic_its {
 u64 cbaser;
 int creadr;
 int cwriter;
 +   struct list_headdevice_list;
 +   struct list_headcollection_list;
  };
  
  struct vgic_dist {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index b498f06..7f217fa 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -21,6 +21,7 @@
  #include linux/kvm.h
  #include linux/kvm_host.h
  #include linux/interrupt.h
 +#include linux/list.h
  
  #include linux/irqchip/arm-gic-v3.h
  #include kvm/arm_vgic.h
 @@ -32,6 +33,25 @@
  #include vgic.h
  #include its-emul.h
  
 +struct its_device {
 +   struct list_head dev_list;
 +   struct list_head itt;
 +   u32 device_id;
 +};
 +
 +struct its_collection {
 +   struct list_head coll_list;
 +   u32 collection_id;
 +   u32 target_addr;
 +};
 +
 +struct its_itte {
 +   struct list_head itte_list;
 +   struct its_collection *collection;
 +   u32 lpi;
 +   u32 event_id;
 +};
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
  
  /* The distributor lock is held by the VGIC MMIO handler. */
 @@ -311,6 +331,9 @@ int vits_init(struct kvm *kvm)
  
 spin_lock_init(its-lock);
  
 +   INIT_LIST_HEAD(its-device_list);
 +   INIT_LIST_HEAD(its-collection_list);
 +
 its-enabled = false;
  
 return -ENXIO;
 @@ -320,11 +343,36 @@ void vits_destroy(struct kvm *kvm)
  {
 struct vgic_dist *dist = kvm-arch.vgic;
 struct vgic_its *its = dist-its;
 +   struct its_device *dev;
 +   struct its_itte *itte;
 +   struct list_head *dev_cur, *dev_temp;
 +   struct list_head *cur, *temp;
  
 if (!vgic_has_its(kvm))
 return;
  
 +   if (!its-device_list.next)
 Why not using list_empty? But I think I would simply remove this since
 the empty case if handle below...
 
 list_empty() requires the list to be initialized before. This check here
 is to detect that map_resources was never called (this is only done on
 the first VCPU run) and thus device_list is basically still all zeroes.
 If we abort the guest without ever running a VCPU (for instance because
 some initialization failed), we call vits_destroy() anyway (because this
 is called when tearing down the VGIC device).
 So the check is here to detect early that vits_destroy() has been called
 without the ITS ever been fully initialized. This fixed a real bug when
 the guest start was aborted before the ITS was ever used.
 I will add a comment to make this clear.

OK. My next question is why don't we call vits_init in vgic_init after
dist-nr_cpus? I had in mind map_resources was linked to the setting of
vgic_dist_base  vgic_cpu_base. Here you do not depend on those
addresses? Do I miss smthg?
 
 +   return;
 +
 +   spin_lock(its-lock);
 +   list_for_each_safe(dev_cur, dev_temp, its-device_list) {
 +   dev = container_of(dev_cur, struct its_device, dev_list);
 isn't the usage of list_for_each_entry_safe more synthetic here?
Sorry my point was about the _ENTRY_ variant that should avoid to use
container_of, isn't it?

Eric
 
 If I got this correctly, we need the _safe variant if we want to remove
 the list item within the loop. Or am I missing something here?
 
 Cheers,
 Andre.
 
 
 +   list_for_each_safe(cur, temp, dev-itt) {
 +   itte = (container_of(cur, struct its_itte, itte_list));
 same

 Eric
 +   list_del(cur);
 +   kfree(itte);
 +   }
 +   

[PATCH] arm64: KVM: Do not inject a 64bit fault for a 32bit guest

2015-08-27 Thread Marc Zyngier
When injecting a fault into a 32bit guest, it seems rather idiotic
to also inject a 64bit fault that is only going to corrupt the
guest state, and lead to a situation where we restore an illegal
context.

Just fix the stupid bug that has been there from day 1.

Cc: sta...@vger.kernel.org
Reported-by: Russell King li...@arm.linux.org.uk
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
Will: Paolo being on holiday, do you mind merging this one
via your tree?

Thanks,

M.

 arch/arm64/kvm/inject_fault.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index f02530e..85c5715 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -168,8 +168,8 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long 
addr)
 {
if (!(vcpu-arch.hcr_el2  HCR_RW))
inject_abt32(vcpu, false, addr);
-
-   inject_abt64(vcpu, false, addr);
+   else
+   inject_abt64(vcpu, false, addr);
 }
 
 /**
@@ -184,8 +184,8 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long 
addr)
 {
if (!(vcpu-arch.hcr_el2  HCR_RW))
inject_abt32(vcpu, true, addr);
-
-   inject_abt64(vcpu, true, addr);
+   else
+   inject_abt64(vcpu, true, addr);
 }
 
 /**
@@ -198,6 +198,6 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 {
if (!(vcpu-arch.hcr_el2  HCR_RW))
inject_undef32(vcpu);
-
-   inject_undef64(vcpu);
+   else
+   inject_undef64(vcpu);
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A question about vring operation

2015-08-27 Thread Michael S. Tsirkin
On Thu, Aug 27, 2015 at 01:18:47AM +, Wang, Wei W wrote:
 Hi all,
 
  
 
 I have a question about the vring_avail:
 
 It only includes an idx (equivalent to the ring tail), which is used by the
 frontend (virtio_net) to fill bufs. The backend (e.g. vhost_net) maintains the
 ring head (last_avail_idx) by itself. The frontend checks if the ring is full
 or empty via a counter (vq-num_free).
 
 My question is why can’t we include the ring head in the  vring_avail struct,
 so that the vq-num_free is not needed, and the backend can directly use it
 without maintaining its own copy?
 
  
 
 Thanks,
 
 Wei
 
  
 

I'm not sure I understand your proposal, and what it would accomplish. Write a 
patch,
that'll make it easier to discuss.
Also copy all relevant mailing lists, not just kvm.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: add multiple times opening support to a virtserialport

2015-08-27 Thread Christopher Covington
On 07/24/2015 08:00 AM, Matt Ma wrote:
 Hi all,
 
 Linaro has developed the foundation for the new Android Emulator code
 base based on a fairly recent upstream QEMU code base, when we
 re-based the code, we updated the device model to be more virtio based
 (for example the drives are now virtio block devices). The aim of this
 is to minimise the delta between upstream and the Android specific
 changes to QEMU. One Android emulator specific feature is the
 AndroidPipe.
 
 AndroidPipe is a communication channel between the guest system and
 the emulator itself. Guest side device node can be opened by multi
 processes at the same time with different service name. It has a
 de-multiplexer on the QEMU side to figure out which service the guest
 actually wanted, so the first write after opening device node is the
 service name guest wanted, after QEMU backend receive this service
 name, create a corresponding communication channel, initialize related
 component, such as file descriptor which connect to the host socket
 serve. So each opening in guest will create a separated communication
 channel.
 
 We can create a separate device for each service type, however some
 services, such as the OpenGL emulation, need to have multiple open
 channels at a time. This is currently not possible using the
 virtserialport which can only be opened once.
 
 Current virtserialport can not  be opened by multiple processes at the
 same time. I know virtserialport has provided buffers beforehand to
 cache data from host to guest, so even there is no guest read, data
 can still be transported from host to guest kernel, when there is
 guest read request, just copy cached data to user space.
 
 We are not sure clearly whether virtio can support
 multi-open-per-device semantics or not, followings are just our
 initial ideas about adding multi-open-per-device feature to a port:
 
 * when there is a open request on a port, kernel will allocate a
 portclient with new id and __wait_queue_head to track this request
 * save this portclient in file-private_data
 * guest kernel pass this portclient info to QEMU and notify that the
 port has been opened
 * QEMU backend will create a clientinfo struct to track this
 communication channel, initialize related component
 * we may change the kernel side strategy of allocating receiving
 buffers in advance to a new strategy, that is when there is a read
 request:
 - allocate a port_buffer, put user space buffer address to
 port_buffer.buf, share memory to avoid memcpy
 - put both portclient id(or portclient addrss) and port_buffer.buf
 to virtqueue, that is the length of buffers chain is 2
 - kick to notify QEMU backend to consume read buffer
 - QEMU backend read portclient info firstly to find the correct
 clientinfo, then read host data directly into virtqueue buffer to
 avoid memcpy
 - guest kernel will wait(similarly in block mode, because the user
 space address has been put into virtqueue) until QEMU backend has
 consumed buffer(all data/part data/nothing have been sent to host
 side)
 - if nothing has been read from host and file descriptor is in
 block mode, read request will wait through __wait_queue_head until
 host side is readable
 
 * above read logic may change the current behavior of transferring
 data to guest kernel even without guest user read
 
 * when there is a write request:
 - allocate a port_buffer, put user space buffer address to
 port_buffer.buf, share memory to avoid memcpy
 - put both portclient id(or portclient addrss) and port_buffer.buf
 to virtqueue, the length of buffers chain is 2
 - kick to notify QEMU backend to consume write buffer
 - QEMU backend read portclient info firstly to find the correct
 clientinfo, then write the virtqueue buffer content to host side as
 current logic
 - guest kernel will wait(similarly in block mode, because the user
 space address has been put into virtqueue) until QEMU backend has
 consumed buffer(all data/part data/nothing have been receive from host
 side)
 - if nothing has been sent out and file descriptor is in block
 mode, write request will wait through __wait_queue_head until host
 side is writable
 
 We obviously don't want to regress existing virtio behaviour and
 performance and welcome the communities expertise to point out
 anything we may have missed out before we get to far down implementing
 our initial proof-of-concept.

Would virtio-vsock be interesting for your purposes?

http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf

(Video doesn't seem to be up yet, but should probably be available eventually
at the following link)

https://www.youtube.com/playlist?list=PLW3ep1uCIRfyLNSu708gWG7uvqlolk0ep

Regards,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line 

[GIT PULL] Late arm64 KVM fix for 4.2

2015-08-27 Thread Will Deacon
Hi Linus,

I appreciate that it's extremely late in the cycle, but we've uncovered
a nasty bug in the arm64 KVM code which allows a badly behaved 32-bit
guest to bring down the host. The fix is simple (it's what I believe we
call a brown paper bag bug) and I don't think it makes sense to sit on
this, particularly as Russell ended up triggering this rather than just
somebody noticing a potential problem by inspection.

Usually arm64 KVM changes would go via Paolo's tree, but he's on holiday
at the moment and the deal is that anything urgent gets shuffled via
the arch trees, so here it is. Please pull.

Cheers,

Will

---8

The following changes since commit c13dcf9f2d6f5f06ef1bf79ec456df614c5e058b:

  Linux 4.2-rc8 (2015-08-23 20:52:59 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-fixes

for you to fetch changes up to 126c69a0bd0e441bf6766a5d9bf20de011be9f68:

  arm64: KVM: Fix host crash when injecting a fault into a 32bit guest 
(2015-08-27 16:16:55 +0100)


Urgent arm64 KVM fix for 4.2:

Fix arm64 KVM issue when injecting an abort into a 32-bit guest, which
would lead to an illegal exception return at EL2 and a subsequent host
crash.


Marc Zyngier (1):
  arm64: KVM: Fix host crash when injecting a fault into a 32bit guest

 arch/arm64/kvm/inject_fault.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] arm64: KVM: Do not inject a 64bit fault for a 32bit guest

2015-08-27 Thread Marc Zyngier
When injecting a fault into a misbehaving 32bit guest, it seems
rather idiotic to also inject a 64bit fault that is only going
to corrupt the guest state. This leads to a situation where we
perform an illegal exception return at EL2 causing the host
to crash instead of killing the guest.

Just fix the stupid bug that has been there from day 1.

Cc: sta...@vger.kernel.org
Reported-by: Russell King rmk+ker...@arm.linux.org.uk
Tested-by: Russell King rmk+ker...@arm.linux.org.uk
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 arch/arm64/kvm/inject_fault.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index f02530e..85c5715 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -168,8 +168,8 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long 
addr)
 {
if (!(vcpu-arch.hcr_el2  HCR_RW))
inject_abt32(vcpu, false, addr);
-
-   inject_abt64(vcpu, false, addr);
+   else
+   inject_abt64(vcpu, false, addr);
 }
 
 /**
@@ -184,8 +184,8 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long 
addr)
 {
if (!(vcpu-arch.hcr_el2  HCR_RW))
inject_abt32(vcpu, true, addr);
-
-   inject_abt64(vcpu, true, addr);
+   else
+   inject_abt64(vcpu, true, addr);
 }
 
 /**
@@ -198,6 +198,6 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 {
if (!(vcpu-arch.hcr_el2  HCR_RW))
inject_undef32(vcpu);
-
-   inject_undef64(vcpu);
+   else
+   inject_undef64(vcpu);
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment

2015-08-27 Thread David Matlack
On Thu, Aug 27, 2015 at 2:59 AM, Wanpeng Li wanpeng...@hotmail.com wrote:
 Hi David,
 On 8/26/15 1:19 AM, David Matlack wrote:

 Thanks for writing v2, Wanpeng.

 On Mon, Aug 24, 2015 at 11:35 PM, Wanpeng Li wanpeng...@hotmail.com
 wrote:

 There is a downside of halt_poll_ns since poll is still happen for idle
 VCPU which can waste cpu usage. This patch adds the ability to adjust
 halt_poll_ns dynamically.

 What testing have you done with these patches? Do you know if this removes
 the overhead of polling in idle VCPUs? Do we lose any of the performance
 from always polling?

 There are two new kernel parameters for changing the halt_poll_ns:
 halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter,
 halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally
 rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow
 matrix is suggested by David:

 if (poll successfully for interrupt): stay the same
else if (length of kvm_vcpu_block is longer than halt_poll_ns_max):
 shrink
else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow

 The way you implemented this wasn't what I expected. I thought you would
 time
 the whole function (kvm_vcpu_block). But I like your approach better. It's
 simpler and [by inspection] does what we want.


 I see there is more idle vCPUs overhead w/ this method even more than always
 halt-poll, so I bring back grow vcpu-halt_poll_ns when interrupt arrives
 and shrinks when idle VCPU is detected. The perfomance looks good in v4.

Why did this patch have a worse idle overhead than always poll?


 Regards,
 Wanpeng Li



halt_poll_ns_shrink/ |
halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns
-+--+---
 1  |  = halt_poll_ns  |  = 0
 halt_poll_ns   | *= halt_poll_ns_grow | /= halt_poll_ns_shrink
otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink

 I was curious why you went with this approach rather than just the
 middle row, or just the last row. Do you think we'll want the extra
 flexibility?

 Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
 ---
   virt/kvm/kvm_main.c | 65
 -
   1 file changed, 64 insertions(+), 1 deletion(-)

 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 93db833..2a4962b 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -66,9 +66,26 @@
   MODULE_AUTHOR(Qumranet);
   MODULE_LICENSE(GPL);

 -static unsigned int halt_poll_ns;
 +#define KVM_HALT_POLL_NS  50
 +#define KVM_HALT_POLL_NS_GROW   2
 +#define KVM_HALT_POLL_NS_SHRINK 0
 +#define KVM_HALT_POLL_NS_MAX 200

 The macros are not necessary. Also, hard coding the numbers in the param
 definitions will make reading the comments above them easier.

 +
 +static unsigned int halt_poll_ns = KVM_HALT_POLL_NS;
   module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);

 +/* Default doubles per-vcpu halt_poll_ns. */
 +static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW;
 +module_param(halt_poll_ns_grow, int, S_IRUGO);
 +
 +/* Default resets per-vcpu halt_poll_ns . */
 +static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK;
 +module_param(halt_poll_ns_shrink, int, S_IRUGO);
 +
 +/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */

 Ah, I misspoke before. I was thinking about round-trip latency. The
 latency
 of a single halt is reduced by about 5-7 us.

 +static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX;
 +module_param(halt_poll_ns_max, int, S_IRUGO);

 We can remove halt_poll_ns_max. vcpu-halt_poll_ns can always start at
 zero
 and grow from there. Then we just need one module param to keep
 vcpu-halt_poll_ns from growing too large.

 [ It would make more sense to remove halt_poll_ns and keep
 halt_poll_ns_max,
but since halt_poll_ns already exists in upstream kernels, we probably
 can't
remove it. ]

 +
   /*
* Ordering of locks:
*
 @@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu
 *vcpu, gfn_t gfn)
   }
   EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);

 +static unsigned int __grow_halt_poll_ns(unsigned int val)
 +{
 +   if (halt_poll_ns_grow  1)
 +   return halt_poll_ns;
 +
 +   val = min(val, halt_poll_ns_max);
 +
 +   if (val == 0)
 +   return halt_poll_ns;
 +
 +   if (halt_poll_ns_grow  halt_poll_ns)
 +   val *= halt_poll_ns_grow;
 +   else
 +   val += halt_poll_ns_grow;
 +
 +   return val;
 +}
 +
 +static unsigned int __shrink_halt_poll_ns(int val, int modifier, int
 minimum)

 minimum never gets used.

 +{
 +   if (modifier  1)
 +   return 0;
 +
 +   if (modifier  halt_poll_ns)
 +   val /= modifier;
 +   else
 +   val -= modifier;
 +
 +   return val;
 +}
 +
 +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)

 These 

Re: add multiple times opening support to a virtserialport

2015-08-27 Thread Christoffer Dall
On Thu, Aug 27, 2015 at 10:23:38AM -0400, Christopher Covington wrote:
 On 07/24/2015 08:00 AM, Matt Ma wrote:
  Hi all,
  
  Linaro has developed the foundation for the new Android Emulator code
  base based on a fairly recent upstream QEMU code base, when we
  re-based the code, we updated the device model to be more virtio based
  (for example the drives are now virtio block devices). The aim of this
  is to minimise the delta between upstream and the Android specific
  changes to QEMU. One Android emulator specific feature is the
  AndroidPipe.
  
  AndroidPipe is a communication channel between the guest system and
  the emulator itself. Guest side device node can be opened by multi
  processes at the same time with different service name. It has a
  de-multiplexer on the QEMU side to figure out which service the guest
  actually wanted, so the first write after opening device node is the
  service name guest wanted, after QEMU backend receive this service
  name, create a corresponding communication channel, initialize related
  component, such as file descriptor which connect to the host socket
  serve. So each opening in guest will create a separated communication
  channel.
  
  We can create a separate device for each service type, however some
  services, such as the OpenGL emulation, need to have multiple open
  channels at a time. This is currently not possible using the
  virtserialport which can only be opened once.
  
  Current virtserialport can not  be opened by multiple processes at the
  same time. I know virtserialport has provided buffers beforehand to
  cache data from host to guest, so even there is no guest read, data
  can still be transported from host to guest kernel, when there is
  guest read request, just copy cached data to user space.
  
  We are not sure clearly whether virtio can support
  multi-open-per-device semantics or not, followings are just our
  initial ideas about adding multi-open-per-device feature to a port:
  
  * when there is a open request on a port, kernel will allocate a
  portclient with new id and __wait_queue_head to track this request
  * save this portclient in file-private_data
  * guest kernel pass this portclient info to QEMU and notify that the
  port has been opened
  * QEMU backend will create a clientinfo struct to track this
  communication channel, initialize related component
  * we may change the kernel side strategy of allocating receiving
  buffers in advance to a new strategy, that is when there is a read
  request:
  - allocate a port_buffer, put user space buffer address to
  port_buffer.buf, share memory to avoid memcpy
  - put both portclient id(or portclient addrss) and port_buffer.buf
  to virtqueue, that is the length of buffers chain is 2
  - kick to notify QEMU backend to consume read buffer
  - QEMU backend read portclient info firstly to find the correct
  clientinfo, then read host data directly into virtqueue buffer to
  avoid memcpy
  - guest kernel will wait(similarly in block mode, because the user
  space address has been put into virtqueue) until QEMU backend has
  consumed buffer(all data/part data/nothing have been sent to host
  side)
  - if nothing has been read from host and file descriptor is in
  block mode, read request will wait through __wait_queue_head until
  host side is readable
  
  * above read logic may change the current behavior of transferring
  data to guest kernel even without guest user read
  
  * when there is a write request:
  - allocate a port_buffer, put user space buffer address to
  port_buffer.buf, share memory to avoid memcpy
  - put both portclient id(or portclient addrss) and port_buffer.buf
  to virtqueue, the length of buffers chain is 2
  - kick to notify QEMU backend to consume write buffer
  - QEMU backend read portclient info firstly to find the correct
  clientinfo, then write the virtqueue buffer content to host side as
  current logic
  - guest kernel will wait(similarly in block mode, because the user
  space address has been put into virtqueue) until QEMU backend has
  consumed buffer(all data/part data/nothing have been receive from host
  side)
  - if nothing has been sent out and file descriptor is in block
  mode, write request will wait through __wait_queue_head until host
  side is writable
  
  We obviously don't want to regress existing virtio behaviour and
  performance and welcome the communities expertise to point out
  anything we may have missed out before we get to far down implementing
  our initial proof-of-concept.

Hi Chris,

 
 Would virtio-vsock be interesting for your purposes?
 
 http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf
 
 (Video doesn't seem to be up yet, but should probably be available eventually
 at the following link)
 
 https://www.youtube.com/playlist?list=PLW3ep1uCIRfyLNSu708gWG7uvqlolk0ep
 
Thanks for looking at this lengthy mail.  Yes, we are 

Re: [PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment

2015-08-27 Thread Wanpeng Li

On 8/28/15 12:25 AM, David Matlack wrote:

On Thu, Aug 27, 2015 at 2:59 AM, Wanpeng Li wanpeng...@hotmail.com wrote:

Hi David,
On 8/26/15 1:19 AM, David Matlack wrote:

Thanks for writing v2, Wanpeng.

On Mon, Aug 24, 2015 at 11:35 PM, Wanpeng Li wanpeng...@hotmail.com
wrote:

There is a downside of halt_poll_ns since poll is still happen for idle
VCPU which can waste cpu usage. This patch adds the ability to adjust
halt_poll_ns dynamically.

What testing have you done with these patches? Do you know if this removes
the overhead of polling in idle VCPUs? Do we lose any of the performance
from always polling?


There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter,
halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally
rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow
matrix is suggested by David:

if (poll successfully for interrupt): stay the same
else if (length of kvm_vcpu_block is longer than halt_poll_ns_max):
shrink
else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow

The way you implemented this wasn't what I expected. I thought you would
time
the whole function (kvm_vcpu_block). But I like your approach better. It's
simpler and [by inspection] does what we want.


I see there is more idle vCPUs overhead w/ this method even more than always
halt-poll, so I bring back grow vcpu-halt_poll_ns when interrupt arrives
and shrinks when idle VCPU is detected. The perfomance looks good in v4.

Why did this patch have a worse idle overhead than always poll?


I’m not sure. I make a mistake when I report the kernelbuild test, the 
perfomance is also worse than always poll w/ your method. I think your 
method didn't grow halt_poll_ns according to if interrupt arrives.


Regards,
Wanpeng Li




Regards,
Wanpeng Li



halt_poll_ns_shrink/ |
halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns
-+--+---
 1  |  = halt_poll_ns  |  = 0
 halt_poll_ns   | *= halt_poll_ns_grow | /= halt_poll_ns_shrink
otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink

I was curious why you went with this approach rather than just the
middle row, or just the last row. Do you think we'll want the extra
flexibility?


Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
---
   virt/kvm/kvm_main.c | 65
-
   1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 93db833..2a4962b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,9 +66,26 @@
   MODULE_AUTHOR(Qumranet);
   MODULE_LICENSE(GPL);

-static unsigned int halt_poll_ns;
+#define KVM_HALT_POLL_NS  50
+#define KVM_HALT_POLL_NS_GROW   2
+#define KVM_HALT_POLL_NS_SHRINK 0
+#define KVM_HALT_POLL_NS_MAX 200

The macros are not necessary. Also, hard coding the numbers in the param
definitions will make reading the comments above them easier.


+
+static unsigned int halt_poll_ns = KVM_HALT_POLL_NS;
   module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);

+/* Default doubles per-vcpu halt_poll_ns. */
+static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW;
+module_param(halt_poll_ns_grow, int, S_IRUGO);
+
+/* Default resets per-vcpu halt_poll_ns . */
+static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK;
+module_param(halt_poll_ns_shrink, int, S_IRUGO);
+
+/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */

Ah, I misspoke before. I was thinking about round-trip latency. The
latency
of a single halt is reduced by about 5-7 us.


+static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX;
+module_param(halt_poll_ns_max, int, S_IRUGO);

We can remove halt_poll_ns_max. vcpu-halt_poll_ns can always start at
zero
and grow from there. Then we just need one module param to keep
vcpu-halt_poll_ns from growing too large.

[ It would make more sense to remove halt_poll_ns and keep
halt_poll_ns_max,
but since halt_poll_ns already exists in upstream kernels, we probably
can't
remove it. ]


+
   /*
* Ordering of locks:
*
@@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu
*vcpu, gfn_t gfn)
   }
   EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);

+static unsigned int __grow_halt_poll_ns(unsigned int val)
+{
+   if (halt_poll_ns_grow  1)
+   return halt_poll_ns;
+
+   val = min(val, halt_poll_ns_max);
+
+   if (val == 0)
+   return halt_poll_ns;
+
+   if (halt_poll_ns_grow  halt_poll_ns)
+   val *= halt_poll_ns_grow;
+   else
+   val += halt_poll_ns_grow;
+
+   return val;
+}
+
+static unsigned int __shrink_halt_poll_ns(int val, int modifier, int
minimum)

minimum never gets used.


+{
+   if (modifier  1)
+   return 0;
+
+   if (modifier  

Re: [Qemu-devel] add multiple times opening support to a virtserialport

2015-08-27 Thread Asias He
Hello Christoffer,

On Fri, Aug 28, 2015 at 2:30 AM, Christoffer Dall
christoffer.d...@linaro.org wrote:
 On Thu, Aug 27, 2015 at 10:23:38AM -0400, Christopher Covington wrote:
 On 07/24/2015 08:00 AM, Matt Ma wrote:
  Hi all,
 
  Linaro has developed the foundation for the new Android Emulator code
  base based on a fairly recent upstream QEMU code base, when we
  re-based the code, we updated the device model to be more virtio based
  (for example the drives are now virtio block devices). The aim of this
  is to minimise the delta between upstream and the Android specific
  changes to QEMU. One Android emulator specific feature is the
  AndroidPipe.
 
  AndroidPipe is a communication channel between the guest system and
  the emulator itself. Guest side device node can be opened by multi
  processes at the same time with different service name. It has a
  de-multiplexer on the QEMU side to figure out which service the guest
  actually wanted, so the first write after opening device node is the
  service name guest wanted, after QEMU backend receive this service
  name, create a corresponding communication channel, initialize related
  component, such as file descriptor which connect to the host socket
  serve. So each opening in guest will create a separated communication
  channel.
 
  We can create a separate device for each service type, however some
  services, such as the OpenGL emulation, need to have multiple open
  channels at a time. This is currently not possible using the
  virtserialport which can only be opened once.
 
  Current virtserialport can not  be opened by multiple processes at the
  same time. I know virtserialport has provided buffers beforehand to
  cache data from host to guest, so even there is no guest read, data
  can still be transported from host to guest kernel, when there is
  guest read request, just copy cached data to user space.
 
  We are not sure clearly whether virtio can support
  multi-open-per-device semantics or not, followings are just our
  initial ideas about adding multi-open-per-device feature to a port:
 
  * when there is a open request on a port, kernel will allocate a
  portclient with new id and __wait_queue_head to track this request
  * save this portclient in file-private_data
  * guest kernel pass this portclient info to QEMU and notify that the
  port has been opened
  * QEMU backend will create a clientinfo struct to track this
  communication channel, initialize related component
  * we may change the kernel side strategy of allocating receiving
  buffers in advance to a new strategy, that is when there is a read
  request:
  - allocate a port_buffer, put user space buffer address to
  port_buffer.buf, share memory to avoid memcpy
  - put both portclient id(or portclient addrss) and port_buffer.buf
  to virtqueue, that is the length of buffers chain is 2
  - kick to notify QEMU backend to consume read buffer
  - QEMU backend read portclient info firstly to find the correct
  clientinfo, then read host data directly into virtqueue buffer to
  avoid memcpy
  - guest kernel will wait(similarly in block mode, because the user
  space address has been put into virtqueue) until QEMU backend has
  consumed buffer(all data/part data/nothing have been sent to host
  side)
  - if nothing has been read from host and file descriptor is in
  block mode, read request will wait through __wait_queue_head until
  host side is readable
 
  * above read logic may change the current behavior of transferring
  data to guest kernel even without guest user read
 
  * when there is a write request:
  - allocate a port_buffer, put user space buffer address to
  port_buffer.buf, share memory to avoid memcpy
  - put both portclient id(or portclient addrss) and port_buffer.buf
  to virtqueue, the length of buffers chain is 2
  - kick to notify QEMU backend to consume write buffer
  - QEMU backend read portclient info firstly to find the correct
  clientinfo, then write the virtqueue buffer content to host side as
  current logic
  - guest kernel will wait(similarly in block mode, because the user
  space address has been put into virtqueue) until QEMU backend has
  consumed buffer(all data/part data/nothing have been receive from host
  side)
  - if nothing has been sent out and file descriptor is in block
  mode, write request will wait through __wait_queue_head until host
  side is writable
 
  We obviously don't want to regress existing virtio behaviour and
  performance and welcome the communities expertise to point out
  anything we may have missed out before we get to far down implementing
  our initial proof-of-concept.

 Hi Chris,


 Would virtio-vsock be interesting for your purposes?

 http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf

 (Video doesn't seem to be up yet, but should probably be available eventually
 at the following link)

 

RE: A question about vring operation

2015-08-27 Thread Wang, Wei W

On 24/07/2015 21:36,  Michael S. Tsirkin wrote:
On Thu, Aug 27, 2015 at 01:18:47AM +, Wang, Wei W wrote:
 Hi all,
 
  
 
 I have a question about the vring_avail:
 
 It only includes an idx (equivalent to the ring tail), which is used 
 by the frontend (virtio_net) to fill bufs. The backend (e.g. 
 vhost_net) maintains the ring head (last_avail_idx) by itself. The 
 frontend checks if the ring is full or empty via a counter (vq-num_free).
 
 My question is why can’t we include the ring head in the  vring_avail 
 struct, so that the vq-num_free is not needed, and the backend can 
 directly use it without maintaining its own copy?
 
  
 
 Thanks,
 
 Wei
 
  
 
I'm not sure I understand your proposal, and what it would accomplish. Write a 
patch, that'll make it easier to discuss.
Also copy all relevant mailing lists, not just kvm.

Thanks Michael. I haven’t got the patch ready yet. I am just wondering if it is 
better to have last_avail_idx in vhost_virtqueue moved into vring_avail.

In other regular ring operations, we usually use *head and *tail to judge if 
the ring is full or empty, but it seems vring_avail does not use this method in 
virtio_net.c (it uses a counter). 

Best,
Wei



[PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-27 Thread Wanpeng Li
v3 - v4:
 * bring back grow vcpu-halt_poll_ns when interrupt arrives and shrinks
   when idle VCPU is detected 

v2 - v3:
 * grow/shrink vcpu-halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink
 * drop the macros and hard coding the numbers in the param definitions
 * update the comments 5-7 us
 * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
   vcpu-halt_poll_ns start at zero
 * drop the wrappers 
 * move the grow/shrink logic before out: w/ if (waited)

v1 - v2:
 * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of 
   the module parameter
 * use the shrink/grow matrix which is suggested by David
 * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle 
VCPU which can waste cpu usage. This patchset add the ability to adjust 
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and 
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. 


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of 
halt-poll is the default 50ns, the max halt_poll_ns of dynamic 
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
 
The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the 
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead 
introduced by always halt-poll.

Wanpeng Li (3):
  KVM: make halt_poll_ns per-VCPU
  KVM: dynamic halt_poll_ns adjustment
  KVM: trace kvm_halt_poll_ns grow/shrink

 include/linux/kvm_host.h   |  1 +
 include/trace/events/kvm.h | 30 
 virt/kvm/kvm_main.c| 50 +++---
 3 files changed, 78 insertions(+), 3 deletions(-)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/3] KVM: trace kvm_halt_poll_ns grow/shrink

2015-08-27 Thread Wanpeng Li
Tracepoint for dynamic halt_pool_ns, fired on every potential change.

Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
---
 include/trace/events/kvm.h | 30 ++
 virt/kvm/kvm_main.c|  8 ++--
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index a44062d..75ddf80 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -356,6 +356,36 @@ TRACE_EVENT(
  __entry-address)
 );
 
+TRACE_EVENT(kvm_halt_poll_ns,
+   TP_PROTO(bool grow, unsigned int vcpu_id, int new, int old),
+   TP_ARGS(grow, vcpu_id, new, old),
+
+   TP_STRUCT__entry(
+   __field(bool, grow)
+   __field(unsigned int, vcpu_id)
+   __field(int, new)
+   __field(int, old)
+   ),
+
+   TP_fast_assign(
+   __entry-grow   = grow;
+   __entry-vcpu_id= vcpu_id;
+   __entry-new= new;
+   __entry-old= old;
+   ),
+
+   TP_printk(vcpu %u: halt_pool_ns %d (%s %d),
+   __entry-vcpu_id,
+   __entry-new,
+   __entry-grow ? grow : shrink,
+   __entry-old)
+);
+
+#define trace_kvm_halt_poll_ns_grow(vcpu_id, new, old) \
+   trace_kvm_halt_poll_ns(true, vcpu_id, new, old)
+#define trace_kvm_halt_poll_ns_shrink(vcpu_id, new, old) \
+   trace_kvm_halt_poll_ns(false, vcpu_id, new, old)
+
 #endif
 
 #endif /* _TRACE_KVM_MAIN_H */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d63790d..a3aa23f0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1918,8 +1918,9 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);
 
 static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
 {
-   int val = vcpu-halt_poll_ns;
+   int old, val;
 
+   old = val = vcpu-halt_poll_ns;
/* 500us step */
if (val == 0  halt_poll_ns_grow)
val = 50;
@@ -1927,18 +1928,21 @@ static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
val *= halt_poll_ns_grow;
 
vcpu-halt_poll_ns = val;
+   trace_kvm_halt_poll_ns_grow(vcpu-vcpu_id, val, old);
 }
 
 static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu)
 {
-   int val = vcpu-halt_poll_ns;
+   int old, val;
 
+   old = val = vcpu-halt_poll_ns;
if (halt_poll_ns_shrink == 0)
val = 0;
else
val /= halt_poll_ns_shrink;
 
vcpu-halt_poll_ns = val;
+   trace_kvm_halt_poll_ns_shrink(vcpu-vcpu_id, val, old);
 }
 
 static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/3] KVM: make halt_poll_ns per-VCPU

2015-08-27 Thread Wanpeng Li
Change halt_poll_ns into per-VCPU variable, seeded from module parameter,
to allow greater flexibility.

Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
---
 include/linux/kvm_host.h | 1 +
 virt/kvm/kvm_main.c  | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 81089cf..1bef9e2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -242,6 +242,7 @@ struct kvm_vcpu {
int sigset_active;
sigset_t sigset;
struct kvm_vcpu_stat stat;
+   unsigned int halt_poll_ns;
 
 #ifdef CONFIG_HAS_IOMEM
int mmio_needed;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d8db2f8f..c06e57c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -217,6 +217,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, 
unsigned id)
vcpu-kvm = kvm;
vcpu-vcpu_id = id;
vcpu-pid = NULL;
+   vcpu-halt_poll_ns = 0;
init_waitqueue_head(vcpu-wq);
kvm_async_pf_vcpu_init(vcpu);
 
@@ -1930,8 +1931,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
bool waited = false;
 
start = cur = ktime_get();
-   if (halt_poll_ns) {
-   ktime_t stop = ktime_add_ns(ktime_get(), halt_poll_ns);
+   if (vcpu-halt_poll_ns) {
+   ktime_t stop = ktime_add_ns(ktime_get(), vcpu-halt_poll_ns);
 
do {
/*
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-27 Thread Wanpeng Li
v3 - v4:
 * bring back grow vcpu-halt_poll_ns when interrupt arrives and shrinks
   when idle VCPU is detected 

v2 - v3:
 * grow/shrink vcpu-halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink
 * drop the macros and hard coding the numbers in the param definitions
 * update the comments 5-7 us
 * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
   vcpu-halt_poll_ns start at zero
 * drop the wrappers 
 * move the grow/shrink logic before out: w/ if (waited)

v1 - v2:
 * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of 
   the module parameter
 * use the shrink/grow matrix which is suggested by David
 * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle 
VCPU which can waste cpu usage. This patchset add the ability to adjust 
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and 
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. 


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of 
halt-poll is the default 50ns, the max halt_poll_ns of dynamic 
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
 
The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the 
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead 
introduced by always halt-poll.

Wanpeng Li (3):
  KVM: make halt_poll_ns per-VCPU
  KVM: dynamic halt_poll_ns adjustment
  KVM: trace kvm_halt_poll_ns grow/shrink

 include/linux/kvm_host.h   |  1 +
 include/trace/events/kvm.h | 30 
 virt/kvm/kvm_main.c| 50 +++---
 3 files changed, 78 insertions(+), 3 deletions(-)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/3] KVM: dynamic halt_poll_ns adjustment

2015-08-27 Thread Wanpeng Li
There is a downside of halt_poll_ns since poll is still happen for idle 
VCPU which can waste cpu usage. This patchset add the ability to adjust 
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and 
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. 

Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of 
halt-poll is the default 50ns, the max halt_poll_ns of dynamic 
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
 
The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the 
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead 
introduced by always halt-poll.

Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
---
 virt/kvm/kvm_main.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c06e57c..d63790d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,9 +66,18 @@
 MODULE_AUTHOR(Qumranet);
 MODULE_LICENSE(GPL);
 
-static unsigned int halt_poll_ns;
+/* halt polling only reduces halt latency by 5-7 us, 2ms is enough */
+static unsigned int halt_poll_ns = 200;
 module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);
 
+/* Default doubles per-vcpu halt_poll_ns. */
+static unsigned int halt_poll_ns_grow = 2;
+module_param(halt_poll_ns_grow, int, S_IRUGO);
+
+/* Default resets per-vcpu halt_poll_ns . */
+static unsigned int halt_poll_ns_shrink;
+module_param(halt_poll_ns_shrink, int, S_IRUGO);
+
 /*
  * Ordering of locks:
  *
@@ -1907,6 +1916,31 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, 
gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);
 
+static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
+{
+   int val = vcpu-halt_poll_ns;
+
+   /* 500us step */
+   if (val == 0  halt_poll_ns_grow)
+   val = 50;
+   else
+   val *= halt_poll_ns_grow;
+
+   vcpu-halt_poll_ns = val;
+}
+
+static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu)
+{
+   int val = vcpu-halt_poll_ns;
+
+   if (halt_poll_ns_shrink == 0)
+   val = 0;
+   else
+   val /= halt_poll_ns_shrink;
+
+   vcpu-halt_poll_ns = val;
+}
+
 static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
 {
if (kvm_arch_vcpu_runnable(vcpu)) {
@@ -1961,6 +1995,11 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
cur = ktime_get();
 
 out:
+   if (waited  vcpu-halt_poll_ns  0)
+   shrink_halt_poll_ns(vcpu);
+   else if (vcpu-halt_poll_ns  halt_poll_ns)
+   grow_halt_poll_ns(vcpu);
+
trace_kvm_vcpu_wakeup(ktime_to_ns(cur) - ktime_to_ns(start), waited);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_block);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment

2015-08-27 Thread Wanpeng Li

Hi David,
On 8/26/15 1:19 AM, David Matlack wrote:

Thanks for writing v2, Wanpeng.

On Mon, Aug 24, 2015 at 11:35 PM, Wanpeng Li wanpeng...@hotmail.com wrote:

There is a downside of halt_poll_ns since poll is still happen for idle
VCPU which can waste cpu usage. This patch adds the ability to adjust
halt_poll_ns dynamically.

What testing have you done with these patches? Do you know if this removes
the overhead of polling in idle VCPUs? Do we lose any of the performance
from always polling?


There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter,
halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally
rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow
matrix is suggested by David:

if (poll successfully for interrupt): stay the same
   else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink
   else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow

The way you implemented this wasn't what I expected. I thought you would time
the whole function (kvm_vcpu_block). But I like your approach better. It's
simpler and [by inspection] does what we want.


I see there is more idle vCPUs overhead w/ this method even more than 
always halt-poll, so I bring back grow vcpu-halt_poll_ns when interrupt 
arrives and shrinks when idle VCPU is detected. The perfomance looks 
good in v4.


Regards,
Wanpeng Li




   halt_poll_ns_shrink/ |
   halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns
   -+--+---
1  |  = halt_poll_ns  |  = 0
halt_poll_ns   | *= halt_poll_ns_grow | /= halt_poll_ns_shrink
   otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink

I was curious why you went with this approach rather than just the
middle row, or just the last row. Do you think we'll want the extra
flexibility?


Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
---
  virt/kvm/kvm_main.c | 65 -
  1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 93db833..2a4962b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,9 +66,26 @@
  MODULE_AUTHOR(Qumranet);
  MODULE_LICENSE(GPL);

-static unsigned int halt_poll_ns;
+#define KVM_HALT_POLL_NS  50
+#define KVM_HALT_POLL_NS_GROW   2
+#define KVM_HALT_POLL_NS_SHRINK 0
+#define KVM_HALT_POLL_NS_MAX 200

The macros are not necessary. Also, hard coding the numbers in the param
definitions will make reading the comments above them easier.


+
+static unsigned int halt_poll_ns = KVM_HALT_POLL_NS;
  module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);

+/* Default doubles per-vcpu halt_poll_ns. */
+static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW;
+module_param(halt_poll_ns_grow, int, S_IRUGO);
+
+/* Default resets per-vcpu halt_poll_ns . */
+static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK;
+module_param(halt_poll_ns_shrink, int, S_IRUGO);
+
+/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */

Ah, I misspoke before. I was thinking about round-trip latency. The latency
of a single halt is reduced by about 5-7 us.


+static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX;
+module_param(halt_poll_ns_max, int, S_IRUGO);

We can remove halt_poll_ns_max. vcpu-halt_poll_ns can always start at zero
and grow from there. Then we just need one module param to keep
vcpu-halt_poll_ns from growing too large.

[ It would make more sense to remove halt_poll_ns and keep halt_poll_ns_max,
   but since halt_poll_ns already exists in upstream kernels, we probably can't
   remove it. ]


+
  /*
   * Ordering of locks:
   *
@@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, 
gfn_t gfn)
  }
  EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);

+static unsigned int __grow_halt_poll_ns(unsigned int val)
+{
+   if (halt_poll_ns_grow  1)
+   return halt_poll_ns;
+
+   val = min(val, halt_poll_ns_max);
+
+   if (val == 0)
+   return halt_poll_ns;
+
+   if (halt_poll_ns_grow  halt_poll_ns)
+   val *= halt_poll_ns_grow;
+   else
+   val += halt_poll_ns_grow;
+
+   return val;
+}
+
+static unsigned int __shrink_halt_poll_ns(int val, int modifier, int minimum)

minimum never gets used.


+{
+   if (modifier  1)
+   return 0;
+
+   if (modifier  halt_poll_ns)
+   val /= modifier;
+   else
+   val -= modifier;
+
+   return val;
+}
+
+static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)

These wrappers aren't necessary.


+{
+   vcpu-halt_poll_ns = __grow_halt_poll_ns(vcpu-halt_poll_ns);
+}
+
+static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu)
+{
+   vcpu-halt_poll_ns = __shrink_halt_poll_ns(vcpu-halt_poll_ns,
+   

[PATCH] KVM: arm/arm64: BUG FIX: Do not inject spurious interrupts

2015-08-27 Thread Pavel Fedin
Commit 71760950bf3dc796e5e53ea3300dec724a09f593
(arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn) introduced
vgic_queue_irq_to_lr() function with additional vgic_dist_irq_is_pending()
check before setting LR_STATE_PENDING bit. In some cases it started
causing the following situation if the userland quickly drops the IRQ back
to inactive state for some reason:
1. Userland injects an IRQ with level == 1, this ends up in
   vgic_update_irq_pending(), which in turn calls vgic_dist_irq_set_pending()
   for this IRQ.
2. vCPU gets kicked. But kernel does not manage to reschedule it quickly
   (!!!)
3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending()
   in this case will call vgic_dist_irq_clear_pending() and reset the
   pending flag.
4. vCPU finally wakes up. It succesfully rolls through through
   __kvm_vgic_flush_hwstate(), which populates vGIC registers. However,
   since neither pending nor active flags are now set for this IRQ,
   vgic_queue_irq_to_lr() does not set any state bits on this LR at all.
   Since this is level-sensitive IRQ, we end up in LR containing only
   LR_EOI_INT bit, causing unnecessary immediate exit from the guest.

This patch fixes the problem by adding forgotten vgic_cpu_irq_clear().
This causes the IRQ not to be included into any lists, if it has been
picked up after getting dropped to inactive level. Since this is a
level-sensitive IRQ, this is correct behavior.

The bug was caught on ARM64 kernel v4.1.6, running qemu virt guest,
where it was caused by emulated pl011.

Signed-off-by: Pavel Fedin p.fe...@samsung.com
---
 virt/kvm/arm/vgic.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 34dad3c..bf155e3 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -,7 +,8 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, 
int irq,
kvm_debug(Set active, clear distributor: 0x%x\n, vlr.state);
vgic_irq_clear_active(vcpu, irq);
vgic_update_state(vcpu-kvm);
-   } else if (vgic_dist_irq_is_pending(vcpu, irq)) {
+   } else {
+   WARN_ON(!vgic_dist_irq_is_pending(vcpu, irq));
vlr.state |= LR_STATE_PENDING;
kvm_debug(Set pending: 0x%x\n, vlr.state);
}
@@ -1567,8 +1568,10 @@ static int vgic_update_irq_pending(struct kvm *kvm, int 
cpuid,
} else {
if (level_triggered) {
vgic_dist_irq_clear_level(vcpu, irq_num);
-   if (!vgic_dist_irq_soft_pend(vcpu, irq_num))
+   if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) {
vgic_dist_irq_clear_pending(vcpu, irq_num);
+   vgic_cpu_irq_clear(vcpu, irq_num);
+   }
}
 
ret = false;
-- 
2.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] pci-test: allow bar 4k

2015-08-27 Thread Michael S. Tsirkin
vmexit using test assumes BAR size is exactly 4K.
Remove this restriction, allow any BAR size.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 x86/vmexit.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/x86/vmexit.c b/x86/vmexit.c
index 7e9af15..3ff7fbf 100644
--- a/x86/vmexit.c
+++ b/x86/vmexit.c
@@ -407,7 +407,7 @@ bool test_wanted(struct test *test, char *wanted[], int 
nwanted)
 int main(int ac, char **av)
 {
int i;
-   unsigned long membar = 0, base, offset;
+   unsigned long membar = 0, base, offset, size;
void *m;
pcidevaddr_t pcidev;
 
@@ -425,12 +425,18 @@ int main(int ac, char **av)
continue;
}
if (pci_bar_is_memory(pcidev, i)) {
+   int p;
+
+   size = pci_bar_size(pcidev, i);
membar = pci_bar_addr(pcidev, i);
-   base = membar  ~4095;
+   base = membar  ~(size - 1);
offset = membar - base;
-   m = alloc_vpages(1);
-   
-   install_page((void *)read_cr3(), base, m);
+   m = alloc_vpages(size / 4096);
+
+   for (p = 0; p  size; p += 4096) {
+   install_page((void *)read_cr3(), base + p,
+ m + p);
+   }
pci_test.memaddr = m + offset;
} else {
pci_test.iobar = pci_bar_addr(pcidev, i);
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Correct type based on checkpatch.pl

2015-08-27 Thread Minjune Kim
Signed-off-by: Minjune Kim infinite.minjune@gmail.com
---
 arch/arm/kvm/interrupts.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 900ef6d..30bded6 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -353,7 +353,7 @@ hyp_dabt:
.align
 hyp_hvc:
/*
-* Getting here is either becuase of a trap from a guest or from calling
+* Getting here is either because of a trap from a guest or from calling
 * HVC from the host kernel, which means switch to Hyp mode.
 */
push{r0, r1, r2}
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] pci: add bar sizing

2015-08-27 Thread Michael S. Tsirkin
Will be used for pci-testdev.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 lib/x86/pci.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/lib/x86/pci.c b/lib/x86/pci.c
index 231668a..5116dac 100644
--- a/lib/x86/pci.c
+++ b/lib/x86/pci.c
@@ -19,6 +19,13 @@ static uint32_t pci_config_read(pcidevaddr_t dev, uint8_t 
reg)
 return inl(0xCFC);
 }
 
+static uint32_t pci_config_write(pcidevaddr_t dev, uint8_t reg, uint32_t val)
+{
+uint32_t index = reg | (dev  8) | (0x1  31);
+outl(0xCF8, index);
+outl(0xCFC, val);
+}
+
 /* Scan bus look for a specific device. Only bus 0 scanned for now. */
 pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id)
 {
@@ -42,6 +49,22 @@ unsigned long pci_bar_addr(pcidevaddr_t dev, int bar_num)
 }
 }
 
+unsigned long pci_bar_size(pcidevaddr_t dev, int bar_num)
+{
+uint32_t bar = pci_config_read(dev, PCI_BASE_ADDRESS_0 + bar_num * 4);
+uint32_t mask;
+
+pci_config_write(dev, PCI_BASE_ADDRESS_0 + bar_num * 4, 0x);
+mask = pci_config_read(dev, PCI_BASE_ADDRESS_0 + bar_num * 4);
+pci_config_write(dev, PCI_BASE_ADDRESS_0 + bar_num * 4, bar);
+
+if (bar  PCI_BASE_ADDRESS_SPACE_IO) {
+return ~(mask  PCI_BASE_ADDRESS_IO_MASK) + 1;
+} else {
+return ~(mask  PCI_BASE_ADDRESS_MEM_MASK) + 1;
+}
+}
+
 bool pci_bar_is_memory(pcidevaddr_t dev, int bar_num)
 {
 uint32_t bar = pci_config_read(dev, PCI_BASE_ADDRESS_0 + bar_num * 4);
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: Security Policy

2015-08-27 Thread Stefan Geißler

Hello kvm mailing list,

I assume, this is a rather uncommon mailing list post since it is not 
directly related to the usage or development of KVM. Instead, the 
following is the case:


I am a student of computer science and am currently working on my 
masters thesis. The work in progress topic is Mining vulnerability 
databases for information on hypervisor vulnerabilities: Analyses and 
Predictions. In the context of this research work i am analyzing 
various security related aspects regarding different hypervisors 
including KVM (A simple example contained in my analysis is the 
discovery process of security vulnerabilities and how the total number 
of disclosed vulnerabilities developes over time).


The reason i am writing this post to the public mailing list is, that i 
am looking for someone who might be willing to support me during my work 
with (for example) information and/or personal experience. Or simply 
said: May i post questions and ask for help explaining my findings from 
time to time or is this too much off-topic for this mailing list?


For now the question would be, whether there is some kind of a formal 
documentation of the vulnerability disclosure process or a security 
policy specific for KVM?


If someone has any information regarding this, feel free to contact me 
directly through my personal mail address. Any help and information will 
be greatly appreciated!


If this post is misplaced at this mailing list maybe someone could point 
me at the right place.


Kind regards and thank you in advance,
Stefan Geißler
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html