Re: [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote:
 @@ -306,6 +354,18 @@ struct dsm_buffer {
  static ram_addr_t dsm_addr;
  static size_t dsm_size;
  
 +struct cmd_out_implemented {

QEMU coding style uses typedef struct {} CamelCase.  Please follow this
convention in all user-defined structs (see ./CODING_STYLE).

  static void dsm_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
  {
 +struct MemoryRegion *dsm_ram_mr = opaque;
 +struct dsm_buffer *dsm;
 +struct dsm_out *out;
 +void *buf;
 +
  assert(val == NOTIFY_VALUE);

The guest should not be able to cause an abort(3).  If val !=
NOTIFY_VALUE we can do nvdebug() and then return.

 +
 +buf = memory_region_get_ram_ptr(dsm_ram_mr);
 +dsm = buf;
 +out = buf;
 +
 +le32_to_cpus(dsm-handle);
 +le32_to_cpus(dsm-arg1);
 +le32_to_cpus(dsm-arg2);

Can SMP guests modify DSM RAM while this thread is running?

We must avoid race conditions.  It's probably better to copy in data
before byte-swapping or checking input values.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM slow LAMP guest

2015-08-25 Thread Hansa

On 24-8-2015 1:26, Wanpeng Li wrote:

On 8/24/15 3:18 AM, Hansa wrote:

On 16-7-2015 13:27, Paolo Bonzini wrote:

On 15/07/2015 22:02, C. Bröcker wrote:

What OS is this?  Is it RHEL/CentOS? If so, halt_poll_ns will be in 6.7
which will be out in a few days/weeks.

Paolo

OK. As said CentOS 6.6.
But where do I put this parameter?

You can add kvm.halt_poll_ns=50 to the kernel command line.  If
you have the parameter, you have the
/sys/module/kvm/parameters/halt_poll_ns file.

Hi,

I upgraded to the CentOS 6.7 release which came out last month and as promised 
the halt_poll_ns parameter was available.
Last week I tested the availability status every 5 minutes on my Wordpress VM's 
with the halt_poll_ns kernel param set on DOM0. I'm pleased to announce that it 
solves the problem!


How much seconds to load your Wordpress site this time?

Regards,
Wanpeng Li

The average is around 0.4 seconds to load my heaviest site on my slowest 
machine.

On the VM server I issued the command below every eleven minutes:

date   curltest-file; _
top -b -n 1 | sed -n '7,12p'  curltest-file; _
curl -o /dev/null -s -wtime_total: %{time_total}\\n https://my.domain.com | perl -pe 'BEGIN 
{use POSIX;} print strftime(%Y-%m-%d %H:%M:%S , localtime)'  curltest-file

This gives me the total time for displaying my site on a local machine. It also 
includes a 'top' command to display which processes are running at each sample. 
All is saved in a file called curltest-file.

I found 7 occurrences in my curltest-file of a time_total larger than 20 
seconds. Top however didn't show any significant CPU or IO activity at those 
sampled times. Further investigations shows me that they are related to a known 
(gravatar)  issue in the Wordpress Jetpack plugin. I didn't include these 
samples in the average total.

Cheers and good luck tweaking your sites!
Best, Hansa
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
 Changlog:
 - Use litten endian for DSM method, thanks for Stefan's suggestion
 
 - introduce a new parameter, @configdata, if it's false, Qemu will
   build a static and readonly namespace in memory and use it serveing
   for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
   reserved region is needed at the end of the @file, it is good for
   the user who want to pass whole nvdimm device and make its data
   completely be visible to guest
 
 - divide the source code into separated files and add maintain info

I have skipped ACPI patches because I'm not very familiar with that
area.

Have you thought about live migration?

Are the contents of the NVDIMM migrated since they are registered as a
RAM region?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/4] irqchip: GIC: Convert to EOImode == 1

2015-08-25 Thread Marc Zyngier
Hi Thomas,

On 25/08/15 16:46, Thomas Gleixner wrote:
 On Tue, 25 Aug 2015, Marc Zyngier wrote:
 +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
 +
  #ifndef MAX_GIC_NR
  #define MAX_GIC_NR  1
  #endif
 @@ -137,6 +140,14 @@ static inline unsigned int gic_irq(struct irq_data *d)
  return d-hwirq;
  }
  
 +static inline bool primary_gic_irq(struct irq_data *d)
 +{
 +if (MAX_GIC_NR  1)
 +return irq_data_get_irq_chip_data(d) == gic_data[0];
 +
 +return true;
 +}
 +
  /*
   * Routines to acknowledge, disable and enable interrupts
   */
 @@ -164,7 +175,14 @@ static void gic_unmask_irq(struct irq_data *d)
  
  static void gic_eoi_irq(struct irq_data *d)
  {
 -writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI);
 +u32 deact_offset = GIC_CPU_EOI;
 +
 +if (static_key_true(supports_deactivate)) {
 +if (primary_gic_irq(d))
 +deact_offset = GIC_CPU_DEACTIVATE;
 
 I really wonder for the whole series whether you really want all that
 static key dance and extra conditionals in the callbacks instead of
 just using seperate irq chips for the different interrupts.

Hmmm. We definitely could have different irqchips between primary and
secondary controllers indeed. We'd still need a static key for the
gic_handle_irq path though, but that's not too bad.

Let me hack something, and I'll come back to you ;-).

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment

2015-08-25 Thread David Matlack
Thanks for writing v2, Wanpeng.

On Mon, Aug 24, 2015 at 11:35 PM, Wanpeng Li wanpeng...@hotmail.com wrote:
 There is a downside of halt_poll_ns since poll is still happen for idle
 VCPU which can waste cpu usage. This patch adds the ability to adjust
 halt_poll_ns dynamically.

What testing have you done with these patches? Do you know if this removes
the overhead of polling in idle VCPUs? Do we lose any of the performance
from always polling?


 There are two new kernel parameters for changing the halt_poll_ns:
 halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter,
 halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally
 rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow
 matrix is suggested by David:

 if (poll successfully for interrupt): stay the same
   else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink
   else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow

The way you implemented this wasn't what I expected. I thought you would time
the whole function (kvm_vcpu_block). But I like your approach better. It's
simpler and [by inspection] does what we want.


   halt_poll_ns_shrink/ |
   halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns
   -+--+---
1  |  = halt_poll_ns  |  = 0
halt_poll_ns   | *= halt_poll_ns_grow | /= halt_poll_ns_shrink
   otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink

I was curious why you went with this approach rather than just the
middle row, or just the last row. Do you think we'll want the extra
flexibility?


 Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
 ---
  virt/kvm/kvm_main.c | 65 
 -
  1 file changed, 64 insertions(+), 1 deletion(-)

 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 93db833..2a4962b 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -66,9 +66,26 @@
  MODULE_AUTHOR(Qumranet);
  MODULE_LICENSE(GPL);

 -static unsigned int halt_poll_ns;
 +#define KVM_HALT_POLL_NS  50
 +#define KVM_HALT_POLL_NS_GROW   2
 +#define KVM_HALT_POLL_NS_SHRINK 0
 +#define KVM_HALT_POLL_NS_MAX 200

The macros are not necessary. Also, hard coding the numbers in the param
definitions will make reading the comments above them easier.

 +
 +static unsigned int halt_poll_ns = KVM_HALT_POLL_NS;
  module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);

 +/* Default doubles per-vcpu halt_poll_ns. */
 +static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW;
 +module_param(halt_poll_ns_grow, int, S_IRUGO);
 +
 +/* Default resets per-vcpu halt_poll_ns . */
 +static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK;
 +module_param(halt_poll_ns_shrink, int, S_IRUGO);
 +
 +/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */

Ah, I misspoke before. I was thinking about round-trip latency. The latency
of a single halt is reduced by about 5-7 us.

 +static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX;
 +module_param(halt_poll_ns_max, int, S_IRUGO);

We can remove halt_poll_ns_max. vcpu-halt_poll_ns can always start at zero
and grow from there. Then we just need one module param to keep
vcpu-halt_poll_ns from growing too large.

[ It would make more sense to remove halt_poll_ns and keep halt_poll_ns_max,
  but since halt_poll_ns already exists in upstream kernels, we probably can't
  remove it. ]

 +
  /*
   * Ordering of locks:
   *
 @@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, 
 gfn_t gfn)
  }
  EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);

 +static unsigned int __grow_halt_poll_ns(unsigned int val)
 +{
 +   if (halt_poll_ns_grow  1)
 +   return halt_poll_ns;
 +
 +   val = min(val, halt_poll_ns_max);
 +
 +   if (val == 0)
 +   return halt_poll_ns;
 +
 +   if (halt_poll_ns_grow  halt_poll_ns)
 +   val *= halt_poll_ns_grow;
 +   else
 +   val += halt_poll_ns_grow;
 +
 +   return val;
 +}
 +
 +static unsigned int __shrink_halt_poll_ns(int val, int modifier, int minimum)

minimum never gets used.

 +{
 +   if (modifier  1)
 +   return 0;
 +
 +   if (modifier  halt_poll_ns)
 +   val /= modifier;
 +   else
 +   val -= modifier;
 +
 +   return val;
 +}
 +
 +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)

These wrappers aren't necessary.

 +{
 +   vcpu-halt_poll_ns = __grow_halt_poll_ns(vcpu-halt_poll_ns);
 +}
 +
 +static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu)
 +{
 +   vcpu-halt_poll_ns = __shrink_halt_poll_ns(vcpu-halt_poll_ns,
 +   halt_poll_ns_shrink, halt_poll_ns);
 +}
 +
  static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
  {
 if (kvm_arch_vcpu_runnable(vcpu)) {
 @@ -1954,6 +2013,10 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
  

Re: [PATCH v2 10/18] nvdimm: init the address region used by DSM method

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:03PM +0800, Xiao Guangrong wrote:
 @@ -257,14 +258,91 @@ static void build_nfit_table(GSList *device_list, char 
 *buf)
  }
  }
  
 +struct dsm_buffer {
 +/* RAM page. */
 +uint32_t handle;
 +uint8_t arg0[16];
 +uint32_t arg1;
 +uint32_t arg2;
 +union {
 +char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
 +};
 +
 +/* MMIO page. */
 +union {
 +uint32_t notify;
 +char pedding[PAGE_SIZE];

s/pedding/padding/

 +};
 +};
 +
 +static ram_addr_t dsm_addr;
 +static size_t dsm_size;
 +
 +static uint64_t dsm_read(void *opaque, hwaddr addr,
 + unsigned size)
 +{
 +return 0;
 +}
 +
 +static void dsm_write(void *opaque, hwaddr addr,
 +  uint64_t val, unsigned size)
 +{
 +}
 +
 +static const MemoryRegionOps dsm_ops = {
 +.read = dsm_read,
 +.write = dsm_write,
 +.endianness = DEVICE_LITTLE_ENDIAN,
 +};
 +
 +static int build_dsm_buffer(void)
 +{
 +MemoryRegion *dsm_ram_mr, *dsm_mmio_mr;
 +ram_addr_t addr;;

s/;;/;/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:06PM +0800, Xiao Guangrong wrote:
 +#ifdef NVDIMM_DEBUG
 +#define nvdebug(fmt, ...) fprintf(stderr, nvdimm:  fmt, ## __VA_ARGS__)
 +#else
 +#define nvdebug(...)
 +#endif

The following allows the compiler to check format strings and syntax
check the argument expressions:

#define NVDIMM_DEBUG 0  /* set to 1 for debug output */
#define nvdebug(fmt, ...) \
if (NVDIMM_DEBUG) { \
fprintf(stderr, nvdimm:  fmt, ## __VA_ARGS__); \
}

This approach avoids bitrot (e.g. debug format string arguments have
become outdated).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:08PM +0800, Xiao Guangrong wrote:
 Function 4 is used to get Namespace lable size

s/lable/label/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/15] KVM: arm64: sync LPI configuration and pending tables

2015-08-25 Thread Andre Przywara
Hi Eric,

On 14/08/15 12:58, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 The LPI configuration and pending tables of the GICv3 LPIs are held
 in tables in (guest) memory. To achieve reasonable performance, we
 cache this data in our own data structures, so we need to sync those
 two views from time to time. This behaviour is well described in the
 GICv3 spec and is also exercised by hardware, so the sync points are
 well known.

 Provide functions that read the guest memory and store the
 information from the configuration and pending tables in the kernel.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
 would help to have change log between v1 - v2 (valid for the whole series)
  include/kvm/arm_vgic.h  |   2 +
  virt/kvm/arm/its-emul.c | 124 
 
  virt/kvm/arm/its-emul.h |   3 ++
  3 files changed, 129 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 2a67a10..323c33a 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -167,6 +167,8 @@ struct vgic_its {
  int cwriter;
  struct list_headdevice_list;
  struct list_headcollection_list;
 +/* memory used for buffering guest's memory */
 +void*buffer_page;
  };
  
  struct vgic_dist {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index b9c40d7..05245cb 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -50,6 +50,7 @@ struct its_itte {
  struct its_collection *collection;
  u32 lpi;
  u32 event_id;
 +u8 priority;
  bool enabled;
  unsigned long *pending;
  };
 @@ -70,8 +71,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm 
 *kvm, int lpi)
  return NULL;
  }
  
 +#define LPI_PROP_ENABLE_BIT(p)  ((p)  LPI_PROP_ENABLED)
 +#define LPI_PROP_PRIORITY(p)((p)  0xfc)
 +
 +/* stores the priority and enable bit for a given LPI */
 +static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 
 prop)
 +{
 +itte-priority = LPI_PROP_PRIORITY(prop);
 +itte-enabled  = LPI_PROP_ENABLE_BIT(prop);
 +}
 +
 +#define GIC_LPI_OFFSET 8192
 +
 +/* We scan the table in chunks the size of the smallest page size */
 4kB chunks?

Marc was complaining about this wording, I think. The rationale was that
4K is already in the code and thus does not need to be repeated in the
comment, whereas the comment should explain the meaning of the value.

 +#define CHUNK_SIZE 4096U
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
  
 +static int nr_idbits_propbase(u64 propbaser)
 +{
 +int nr_idbits = (1U  (propbaser  0x1f)) + 1;
 +
 +return max(nr_idbits, INTERRUPT_ID_BITS_ITS);
 +}
 +
 +/*
 + * Scan the whole LPI configuration table and put the LPI configuration
 + * data in our own data structures. This relies on the LPI being
 + * mapped before.
 + */
 +static bool its_update_lpis_configuration(struct kvm *kvm)
 +{
 +struct vgic_dist *dist = kvm-arch.vgic;
 +u8 *prop = dist-its.buffer_page;
 +u32 tsize;
 +gpa_t propbase;
 +int lpi = GIC_LPI_OFFSET;
 +struct its_itte *itte;
 +struct its_device *device;
 +int ret;
 +
 +propbase = BASER_BASE_ADDRESS(dist-propbaser);
 +tsize = nr_idbits_propbase(dist-propbaser);
 +
 +while (tsize  0) {
 +int chunksize = min(tsize, CHUNK_SIZE);
 +
 +ret = kvm_read_guest(kvm, propbase, prop, chunksize);
 I think you still have the spin_lock issue  since if my understanding is
 correct this is called from
 vgic_handle_mmio_access/vcall_range_handler/gic_enable_lpis
 where vgic_handle_mmio_access. Or does it take another path?

Well, it's (also) called on handling the INVALL command, but you are
right that on that enable path the dist lock is held. I reckon that this
init part isn't racy so that shouldn't be a problem (famous last words ;-).
Let me see whether I can find a way to just drop the lock around the
while loop.

Cheers,
Andre.

 
 Shouldn't we create a new kvm_io_device to avoid holding the dist lock?
 
 Eric
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:51:59PM +0800, Xiao Guangrong wrote:
 +static void set_file(Object *obj, const char *str, Error **errp)
 +{
 +PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
 +
 +if (nvdimm-file) {
 +g_free(nvdimm-file);
 +}

g_free(NULL) is a nop so it's safe to replace the if with just
g_free(nvdimm-file).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 11/15] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-08-25 Thread Andre Przywara
Hi Eric,

On 14/08/15 12:58, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 As the actual LPI number in a guest can be quite high, but is mostly
 assigned using a very sparse allocation scheme, bitmaps and arrays
 for storing the virtual interrupt status are a waste of memory.
 We use our equivalent of the Interrupt Translation Table Entry
 (ITTE) to hold this extra status information for a virtual LPI.
 As the normal VGIC code cannot use it's fancy bitmaps to manage
 pending interrupts, we provide a hook in the VGIC code to let the
 ITS emulation handle the list register queueing itself.
 LPIs are located in a separate number range (=8192), so
 distinguishing them is easy. With LPIs being only edge-triggered, we
 get away with a less complex IRQ handling.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h  |  2 ++
  virt/kvm/arm/its-emul.c | 71 
 
  virt/kvm/arm/its-emul.h |  3 ++
  virt/kvm/arm/vgic-v3-emul.c |  2 ++
  virt/kvm/arm/vgic.c | 72 
 ++---
  5 files changed, 133 insertions(+), 17 deletions(-)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 1648668..2a67a10 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -147,6 +147,8 @@ struct vgic_vm_ops {
   int (*init_model)(struct kvm *);
   void(*destroy_model)(struct kvm *);
   int (*map_resources)(struct kvm *, const struct vgic_params *);
 + bool(*queue_lpis)(struct kvm_vcpu *);
 + void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
  };

  struct vgic_io_device {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index 7f217fa..b9c40d7 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -50,8 +50,26 @@ struct its_itte {
   struct its_collection *collection;
   u32 lpi;
   u32 event_id;
 + bool enabled;
 + unsigned long *pending;
  };

 +#define for_each_lpi(dev, itte, kvm) \
 + list_for_each_entry(dev, (kvm)-arch.vgic.its.device_list, dev_list) \
 + list_for_each_entry(itte, (dev)-itt, itte_list)
 +
 You have a checkpatch error here:
 
 ERROR: Macros with complex values should be enclosed in parentheses
 #52: FILE: virt/kvm/arm/its-emul.c:57:
 +#define for_each_lpi(dev, itte, kvm) \
 +   list_for_each_entry(dev, (kvm)-arch.vgic.its.device_list, dev_list) 
 \
 +   list_for_each_entry(itte, (dev)-itt, itte_list)

I know about that one. The problem is that if I add the parentheses it
breaks the usage below due to the curly brackets. But the definition
above is just so convenient and I couldn't find another neat solution so
far. If you are concerned about that I can give it another try,
otherwise I tend to just ignore checkpatch here.

 +static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi)
 +{
 can't we have the same LPI present in different interrupt translation
 tables? I don't know it is a sensible setting but I did not succeed in
 finding it was not possible.

Thanks to Marc I am happy (and relieved!) to point you to 6.1.1 LPI INTIDs:
The behavior of the GIC is UNPREDICTABLE if software:
- Maps multiple EventID/DeviceID combinations to the same physical LPI
INTID.

So I exercise the freedom of UNPREDICTABLE here ;-)

 + struct its_device *device;
 + struct its_itte *itte;
 +
 + for_each_lpi(device, itte, kvm) {
 + if (itte-lpi == lpi)
 + return itte;
 + }
 + return NULL;
 +}
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)

  /* The distributor lock is held by the VGIC MMIO handler. */
 @@ -145,6 +163,59 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu 
 *vcpu,
   return false;
  }

 +/*
 + * Find all enabled and pending LPIs and queue them into the list
 + * registers.
 + * The dist lock is held by the caller.
 + */
 +bool vits_queue_lpis(struct kvm_vcpu *vcpu)
 +{
 + struct vgic_its *its = vcpu-kvm-arch.vgic.its;
 + struct its_device *device;
 + struct its_itte *itte;
 + bool ret = true;
 +
 + if (!vgic_has_its(vcpu-kvm))
 + return true;
 + if (!its-enabled || !vcpu-kvm-arch.vgic.lpis_enabled)
 + return true;
 +
 + spin_lock(its-lock);
 + for_each_lpi(device, itte, vcpu-kvm) {
 + if (!itte-enabled || !test_bit(vcpu-vcpu_id, itte-pending))
 + continue;
 +
 + if (!itte-collection)
 + continue;
 +
 + if (itte-collection-target_addr != vcpu-vcpu_id)
 + continue;
 +
 + __clear_bit(vcpu-vcpu_id, itte-pending);
 +
 + ret = vgic_queue_irq(vcpu, 0, itte-lpi);
 what if the vgic_queue_irq fails since no LR can be found, the
 itte-pending was cleared so we forget that LPI? shouldn't we restore
 the pending state in ITT? in vgic_queue_hwirq the state change only is
 performed if the 

Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Joe Perches
On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
 All fields of kvm_io_range were initialized or copied explicitly
 afterwards. So switch to use kmalloc().

Is there any compiler added alignment padding
in either structure?  If so, those padding
areas would now be uninitialized and may leak
kernel data if copied to user-space.

 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
[]
 @@ -3248,7 +3248,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum 
 kvm_bus bus_idx, gpa_t addr,
   if (bus-dev_count - bus-ioeventfd_count  NR_IOBUS_DEVS - 1)
   return -ENOSPC;
  
 - new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count + 1) *
 + new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count + 1) *
 sizeof(struct kvm_io_range)), GFP_KERNEL);
   if (!new_bus)
   return -ENOMEM;
 @@ -3280,7 +3280,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum 
 kvm_bus bus_idx,
   if (r)
   return r;
  
 - new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count - 1) *
 + new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count - 1) *
 sizeof(struct kvm_io_range)), GFP_KERNEL);
   if (!new_bus)
   return -ENOMEM;



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
 diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
 index a53d235..7a270a8 100644
 --- a/hw/mem/nvdimm/pc-nvdimm.c
 +++ b/hw/mem/nvdimm/pc-nvdimm.c
 @@ -24,6 +24,19 @@
  
  #include hw/mem/pc-nvdimm.h
  
 +#define PAGE_SIZE  (1UL  12)

This macro name is likely to collide with system headers or other code.

Could you use the existing TARGET_PAGE_SIZE constant instead?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:01PM +0800, Xiao Guangrong wrote:
 The parameter @file is used as backed memory for NVDIMM which is
 divided into two parts if @dataconfig is true:

s/dataconfig/configdata/

 @@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
   set_configdata, NULL);
  }
  
 +static uint64_t get_file_size(int fd)
 +{
 +struct stat stat_buf;
 +uint64_t size;
 +
 +if (fstat(fd, stat_buf)  0) {
 +return 0;
 +}
 +
 +if (S_ISREG(stat_buf.st_mode)) {
 +return stat_buf.st_size;
 +}
 +
 +if (S_ISBLK(stat_buf.st_mode)  !ioctl(fd, BLKGETSIZE64, size)) {
 +return size;
 +}

#ifdef __linux__ for ioctl(fd, BLKGETSIZE64, size)?

There is nothing Linux-specific about emulating NVDIMMs so this code
should compile on all platforms.

 +
 +return 0;
 +}
 +
  static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
  {
  PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
 +char name[512];
 +void *buf;
 +ram_addr_t addr;
 +uint64_t size, nvdimm_size, config_size = MIN_CONFIG_DATA_SIZE;
 +int fd;
  
  if (!nvdimm-file) {
  error_setg(errp, file property is not set);
  }

Missing return here.

 +
 +fd = open(nvdimm-file, O_RDWR);

Does it make sense to support read-only NVDIMMs?

It could be handy for sharing a read-only file between unprivileged
guests.  The permissions on the file would only allow read, not write.

 +if (fd  0) {
 +error_setg(errp, can not open %s, nvdimm-file);

s/can not/cannot/

 +return;
 +}
 +
 +size = get_file_size(fd);
 +buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE.
This can be added in the future.

 +if (buf == MAP_FAILED) {
 +error_setg(errp, can not do mmap on %s, nvdimm-file);
 +goto do_close;
 +}
 +
 +nvdimm-config_data_size = config_size;
 +if (nvdimm-configdata) {
 +/* reserve MIN_CONFIGDATA_AREA_SIZE for configue data. */
 +nvdimm_size = size - config_size;
 +nvdimm-config_data_addr = buf + nvdimm_size;
 +} else {
 +nvdimm_size = size;
 +nvdimm-config_data_addr = NULL;
 +}
 +
 +if ((int64_t)nvdimm_size = 0) {

The error cases can be detected before mmap(2).  That avoids the int64_t
cast and also avoids nvdimm_size underflow and the bogus
nvdimm-config_data_addr calculation above.

size = get_file_size(fd);
if (size == 0) {
error_setg(errp, empty file or unable to get file size);
goto do_close;
} else if (nvdimm-configdata  size  config_size) {{
error_setg(errp, file size is too small to store NVDIMM
  configure data);
goto do_close;
}

 +error_setg(errp, file size is too small to store NVDIMM
 +  configure data);
 +goto do_unmap;
 +}
 +
 +addr = reserved_range_push(nvdimm_size);
 +if (!addr) {
 +error_setg(errp, do not have enough space for size %#lx.\n, size);

error_setg() messages must not have a newline at the end.

Please use %# PRIx64 instead of %#lx so compilation works on 32-bit
hosts where sizeof(long) == 4.

 +goto do_unmap;
 +}
 +
 +nvdimm-device_index = new_device_index();
 +sprintf(name, NVDIMM-%d, nvdimm-device_index);
 +memory_region_init_ram_ptr(nvdimm-mr, OBJECT(dev), name, nvdimm_size,
 +   buf);

How is the autogenerated name used?

Why not just use pc-nvdimm.memory?

 +vmstate_register_ram(nvdimm-mr, DEVICE(dev));
 +memory_region_add_subregion(get_system_memory(), addr, nvdimm-mr);
 +
 +return;

fd is leaked.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
 NVDIMM reserves all the free range above 4G to do:
 - Persistent Memory (PMEM) mapping
 - implement NVDIMM ACPI device _DSM method
 
 Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
 ---
  hw/i386/pc.c   | 12 ++--
  hw/mem/nvdimm/pc-nvdimm.c  | 13 +
  include/hw/mem/pc-nvdimm.h |  1 +
  3 files changed, 24 insertions(+), 2 deletions(-)

CCing Igor for memory hotplug-related changes.

 diff --git a/hw/i386/pc.c b/hw/i386/pc.c
 index 7661ea9..41af6ea 100644
 --- a/hw/i386/pc.c
 +++ b/hw/i386/pc.c
 @@ -64,6 +64,7 @@
  #include hw/pci/pci_host.h
  #include acpi-build.h
  #include hw/mem/pc-dimm.h
 +#include hw/mem/pc-nvdimm.h
  #include qapi/visitor.h
  #include qapi-visit.h
  
 @@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
  MemoryRegion *ram_below_4g, *ram_above_4g;
  FWCfgState *fw_cfg;
  PCMachineState *pcms = PC_MACHINE(machine);
 +ram_addr_t offset;
  
  assert(machine-ram_size == below_4g_mem_size + above_4g_mem_size);
  
 @@ -1339,6 +1341,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
  exit(EXIT_FAILURE);
  }
  
 +offset = 0x1ULL + above_4g_mem_size;
 +
  /* initialize hotplug memory address space */
  if (guest_info-has_reserved_memory 
  (machine-ram_size  machine-maxram_size)) {
 @@ -1358,8 +1362,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
  exit(EXIT_FAILURE);
  }
  
 -pcms-hotplug_memory.base =
 -ROUND_UP(0x1ULL + above_4g_mem_size, 1ULL  30);
 +pcms-hotplug_memory.base = ROUND_UP(offset, 1ULL  30);
  
  if (pcms-enforce_aligned_dimm) {
  /* size hotplug region assuming 1G page max alignment per slot */
 @@ -1377,8 +1380,13 @@ FWCfgState *pc_memory_init(MachineState *machine,
 hotplug-memory, hotplug_mem_size);
  memory_region_add_subregion(system_memory, pcms-hotplug_memory.base,
  pcms-hotplug_memory.mr);
 +
 +offset = pcms-hotplug_memory.base + hotplug_mem_size;
  }
  
 + /* all the space left above 4G is reserved for NVDIMM. */
 +pc_nvdimm_reserve_range(offset);
 +
  /* Initialize PC system firmware */
  pc_system_firmware_init(rom_memory, guest_info-isapc_ram_fw);
  
 diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
 index a53d235..7a270a8 100644
 --- a/hw/mem/nvdimm/pc-nvdimm.c
 +++ b/hw/mem/nvdimm/pc-nvdimm.c
 @@ -24,6 +24,19 @@
  
  #include hw/mem/pc-nvdimm.h
  
 +#define PAGE_SIZE  (1UL  12)
 +
 +static struct nvdimms_info {
 +ram_addr_t current_addr;
 +} nvdimms_info;
 +
 +/* the address range [offset, ~0ULL) is reserved for NVDIMM. */
 +void pc_nvdimm_reserve_range(ram_addr_t offset)
 +{
 +offset = ROUND_UP(offset, PAGE_SIZE);
 +nvdimms_info.current_addr = offset;
 +}
 +
  static char *get_file(Object *obj, Error **errp)
  {
  PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
 diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
 index 51152b8..8601e9b 100644
 --- a/include/hw/mem/pc-nvdimm.h
 +++ b/include/hw/mem/pc-nvdimm.h
 @@ -28,4 +28,5 @@ typedef struct PCNVDIMMDevice {
  #define PC_NVDIMM(obj) \
  OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
  
 +void pc_nvdimm_reserve_range(ram_addr_t offset);
  #endif
 -- 
 2.4.3
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/4] irqchip: GIC: Convert to EOImode == 1

2015-08-25 Thread Thomas Gleixner
On Tue, 25 Aug 2015, Marc Zyngier wrote:
 +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
 +
  #ifndef MAX_GIC_NR
  #define MAX_GIC_NR   1
  #endif
 @@ -137,6 +140,14 @@ static inline unsigned int gic_irq(struct irq_data *d)
   return d-hwirq;
  }
  
 +static inline bool primary_gic_irq(struct irq_data *d)
 +{
 + if (MAX_GIC_NR  1)
 + return irq_data_get_irq_chip_data(d) == gic_data[0];
 +
 + return true;
 +}
 +
  /*
   * Routines to acknowledge, disable and enable interrupts
   */
 @@ -164,7 +175,14 @@ static void gic_unmask_irq(struct irq_data *d)
  
  static void gic_eoi_irq(struct irq_data *d)
  {
 - writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI);
 + u32 deact_offset = GIC_CPU_EOI;
 +
 + if (static_key_true(supports_deactivate)) {
 + if (primary_gic_irq(d))
 + deact_offset = GIC_CPU_DEACTIVATE;

I really wonder for the whole series whether you really want all that
static key dance and extra conditionals in the callbacks instead of
just using seperate irq chips for the different interrupts.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/15] KVM: arm64: sync LPI configuration and pending tables

2015-08-25 Thread Andre Przywara
Hi Eric,

On 14/08/15 13:35, Eric Auger wrote:
 On 08/14/2015 01:58 PM, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 The LPI configuration and pending tables of the GICv3 LPIs are held
 in tables in (guest) memory. To achieve reasonable performance, we
 cache this data in our own data structures, so we need to sync those
 two views from time to time. This behaviour is well described in the
 GICv3 spec and is also exercised by hardware, so the sync points are
 well known.

 Provide functions that read the guest memory and store the
 information from the configuration and pending tables in the kernel.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
 would help to have change log between v1 - v2 (valid for the whole series)
  include/kvm/arm_vgic.h  |   2 +
  virt/kvm/arm/its-emul.c | 124 
 
  virt/kvm/arm/its-emul.h |   3 ++
  3 files changed, 129 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 2a67a10..323c33a 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -167,6 +167,8 @@ struct vgic_its {
 int cwriter;
 struct list_headdevice_list;
 struct list_headcollection_list;
 +   /* memory used for buffering guest's memory */
 +   void*buffer_page;
  };
  
  struct vgic_dist {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index b9c40d7..05245cb 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -50,6 +50,7 @@ struct its_itte {
 struct its_collection *collection;
 u32 lpi;
 u32 event_id;
 +   u8 priority;
 bool enabled;
 unsigned long *pending;
  };
 @@ -70,8 +71,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm 
 *kvm, int lpi)
 return NULL;
  }
  
 +#define LPI_PROP_ENABLE_BIT(p) ((p)  LPI_PROP_ENABLED)
 +#define LPI_PROP_PRIORITY(p)   ((p)  0xfc)
 +
 +/* stores the priority and enable bit for a given LPI */
 +static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 
 prop)
 +{
 +   itte-priority = LPI_PROP_PRIORITY(prop);
 +   itte-enabled  = LPI_PROP_ENABLE_BIT(prop);
 +}
 +
 +#define GIC_LPI_OFFSET 8192
 +
 +/* We scan the table in chunks the size of the smallest page size */
 4kB chunks?
 +#define CHUNK_SIZE 4096U
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
  
 +static int nr_idbits_propbase(u64 propbaser)
 +{
 +   int nr_idbits = (1U  (propbaser  0x1f)) + 1;
 +
 +   return max(nr_idbits, INTERRUPT_ID_BITS_ITS);
 +}
 +
 +/*
 + * Scan the whole LPI configuration table and put the LPI configuration
 + * data in our own data structures. This relies on the LPI being
 + * mapped before.
 + */
 +static bool its_update_lpis_configuration(struct kvm *kvm)
 +{
 +   struct vgic_dist *dist = kvm-arch.vgic;
 +   u8 *prop = dist-its.buffer_page;
 +   u32 tsize;
 +   gpa_t propbase;
 +   int lpi = GIC_LPI_OFFSET;
 +   struct its_itte *itte;
 +   struct its_device *device;
 +   int ret;
 +
 +   propbase = BASER_BASE_ADDRESS(dist-propbaser);
 +   tsize = nr_idbits_propbase(dist-propbaser);
 +
 +   while (tsize  0) {
 +   int chunksize = min(tsize, CHUNK_SIZE);
 +
 +   ret = kvm_read_guest(kvm, propbase, prop, chunksize);
 I think you still have the spin_lock issue  since if my understanding is
 correct this is called from
 vgic_handle_mmio_access/vcall_range_handler/gic_enable_lpis
 where vgic_handle_mmio_access. Or does it take another path?

 Shouldn't we create a new kvm_io_device to avoid holding the dist lock?
 
 Sorry I forgot it was the case already. But currently we always register
 the same io ops (registration entry point being
 vgic_register_kvm_io_dev) and maybe we should have separate dispatcher
 function for dist, redit and its?

What would be the idea behind it? To have separate locks for each? I
don't think that will work, as some ITS functions are called from GICv3
register handler functions which manipulate members of the distributor
structure. So I am more in favour of dropping the dist lock in these
cases before handing off execution to ITS specific functions.

Cheers,
Andre.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 3/3] kvm: add tracepoint for fast mmio

2015-08-25 Thread Jason Wang


On 08/25/2015 07:34 PM, Michael S. Tsirkin wrote:
 On Tue, Aug 25, 2015 at 03:47:15PM +0800, Jason Wang wrote:
  Cc: Gleb Natapov g...@kernel.org
  Cc: Paolo Bonzini pbonz...@redhat.com
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
   arch/x86/kvm/trace.h | 17 +
   arch/x86/kvm/vmx.c   |  1 +
   arch/x86/kvm/x86.c   |  1 +
   3 files changed, 19 insertions(+)
  
  diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
  index 4eae7c3..2d4e81a 100644
  --- a/arch/x86/kvm/trace.h
  +++ b/arch/x86/kvm/trace.h
  @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio,
   __entry-count  1 ? (...) : )
   );
   
  +TRACE_EVENT(kvm_fast_mmio,
  +  TP_PROTO(u64 gpa),
  +  TP_ARGS(gpa),
  +
  +  TP_STRUCT__entry(
  +  __field(u64,gpa)
  +  ),
  +
  +  TP_fast_assign(
  +  __entry-gpa= gpa;
  +  ),
  +
  +  TP_printk(fast mmio at gpa 0x%llx, __entry-gpa)
  +);
  +
  +
  +
 don't add multiple empty lines please.


Ok
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang


On 08/25/2015 07:33 PM, Michael S. Tsirkin wrote:
 On Tue, Aug 25, 2015 at 03:47:14PM +0800, Jason Wang wrote:
  We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
  and another is KVM_FAST_MMIO_BUS. This leads to issue:
  
  - kvm_io_bus_destroy() knows nothing about the devices on two buses
points to a single dev. Which will lead double free [1] during exit.
  - wildcard eventfd ignores data len, so it was registered as a
kvm_io_range with zero length. This will fail the binary search in
kvm_io_bus_get_first_dev() when we try to emulate through
KVM_MMIO_BUS. This will cause userspace io emulation request instead
of a eventfd notification (virtqueue kick will be trapped by qemu
instead of vhost in this case).
  
  Fixing this by don't register wildcard mmio eventfd on two
  buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
  double free issue of kvm_io_bus_destroy(). For the arch/setups that
  does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
  KVM_FAST_MMIO_BUS first to see it it has a match.
  
  [1] Panic caused by double free:
  
  CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
  #28-Ubuntu
  Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
  task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
  RIP: 0010:[c07e25d8]  [c07e25d8] 
  ioeventfd_release+0x28/0x60 [kvm]
  RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
  RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
  RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
  RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
  R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
  R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
  FS:  7fc1ee3e6700() GS:88023e24() 
  knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
  Stack:
  88021e7cc000  88020e7f3be8 c07e2622
  88020e7f3c38 c07df69a 880232524160 88020e792d80
    880219b78c00 0008 8802321686a8
  Call Trace:
  [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
  [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
  [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
  [811f69f7] __fput+0xe7/0x250
  [811f6bae] fput+0xe/0x10
  [81093f04] task_work_run+0xd4/0xf0
  [81079358] do_exit+0x368/0xa50
  [81082c8f] ? recalc_sigpending+0x1f/0x60
  [81079ad5] do_group_exit+0x45/0xb0
  [81085c71] get_signal+0x291/0x750
  [810144d8] do_signal+0x28/0xab0
  [810f3a3b] ? do_futex+0xdb/0x5d0
  [810b7028] ? __wake_up_locked_key+0x18/0x20
  [810f3fa6] ? SyS_futex+0x76/0x170
  [81014fc9] do_notify_resume+0x69/0xb0
  [817cb9af] int_signal+0x12/0x17
  Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 
  20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 
  b8 00 01 10 00 00
  RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
  RSP 88020e7f3bc8
  
  Cc: Gleb Natapov g...@kernel.org
  Cc: Paolo Bonzini pbonz...@redhat.com
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
 I'm worried that this slows down the regular MMIO.

I doubt whether or not it was measurable.

 Could you share performance #s please?
 You need a mix of len=0 and len=2 matches.

Ok.

 One solution for the first issue is to create two ioeventfd objects instead.

Sounds good.

 For the second issue, we could change bsearch compare function instead.

What do you mean by second issue ?

 Again, affects all devices to performance #s would be needed.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Joe Perches
On Wed, 2015-08-26 at 13:39 +0800, Jason Wang wrote:
 
 On 08/25/2015 11:29 PM, Joe Perches wrote:
  On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
   All fields of kvm_io_range were initialized or copied explicitly
   afterwards. So switch to use kmalloc().
  Is there any compiler added alignment padding
  in either structure?  If so, those padding
  areas would now be uninitialized and may leak
  kernel data if copied to user-space.
 
 I get your concern, but I don't a way to copy them to userspace, did you?

I didn't look.

I just wanted you to be aware there's a difference
and a reason why kzalloc might be used even though
all structure members are initialized.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Jason Wang


On 08/26/2015 01:45 PM, Joe Perches wrote:
 On Wed, 2015-08-26 at 13:39 +0800, Jason Wang wrote:
  
  On 08/25/2015 11:29 PM, Joe Perches wrote:
   On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
All fields of kvm_io_range were initialized or copied explicitly
afterwards. So switch to use kmalloc().
   Is there any compiler added alignment padding
   in either structure?  If so, those padding
   areas would now be uninitialized and may leak
   kernel data if copied to user-space.
  
  I get your concern, but I don't a way to copy them to userspace, did you?
 I didn't look.

 I just wanted you to be aware there's a difference
 and a reason why kzalloc might be used even though
 all structure members are initialized.


I see, thanks for the reminding. Looks like we are safe and I will add
something like kvm_io_range was never accessed by userspace in the
commit log if there's a new version.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang


On 08/25/2015 07:51 PM, Michael S. Tsirkin wrote:
 On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote:
  We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
  and another is KVM_FAST_MMIO_BUS. This leads to issue:
  
  - kvm_io_bus_destroy() knows nothing about the devices on two buses
points to a single dev. Which will lead double free [1] during exit.
  - wildcard eventfd ignores data len, so it was registered as a
kvm_io_range with zero length. This will fail the binary search in
kvm_io_bus_get_first_dev() when we try to emulate through
KVM_MMIO_BUS. This will cause userspace io emulation request instead
of a eventfd notification (virtqueue kick will be trapped by qemu
instead of vhost in this case).
  
  Fixing this by don't register wildcard mmio eventfd on two
  buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
  double free issue of kvm_io_bus_destroy(). For the arch/setups that
  does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
  KVM_FAST_MMIO_BUS first to see it it has a match.
  
  [1] Panic caused by double free:
  
  CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
  #28-Ubuntu
  Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
  task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
  RIP: 0010:[c07e25d8]  [c07e25d8] 
  ioeventfd_release+0x28/0x60 [kvm]
  RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
  RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
  RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
  RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
  R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
  R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
  FS:  7fc1ee3e6700() GS:88023e24() 
  knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
  Stack:
  88021e7cc000  88020e7f3be8 c07e2622
  88020e7f3c38 c07df69a 880232524160 88020e792d80
    880219b78c00 0008 8802321686a8
  Call Trace:
  [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
  [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
  [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
  [811f69f7] __fput+0xe7/0x250
  [811f6bae] fput+0xe/0x10
  [81093f04] task_work_run+0xd4/0xf0
  [81079358] do_exit+0x368/0xa50
  [81082c8f] ? recalc_sigpending+0x1f/0x60
  [81079ad5] do_group_exit+0x45/0xb0
  [81085c71] get_signal+0x291/0x750
  [810144d8] do_signal+0x28/0xab0
  [810f3a3b] ? do_futex+0xdb/0x5d0
  [810b7028] ? __wake_up_locked_key+0x18/0x20
  [810f3fa6] ? SyS_futex+0x76/0x170
  [81014fc9] do_notify_resume+0x69/0xb0
  [817cb9af] int_signal+0x12/0x17
  Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 
  20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 
  b8 00 01 10 00 00
  RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
  RSP 88020e7f3bc8
  
  Cc: Gleb Natapov g...@kernel.org
  Cc: Paolo Bonzini pbonz...@redhat.com
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
  Changes from V2:
  - Tweak styles and comment suggested by Cornelia.
  Changes from v1:
  - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
needed to save lots of unnecessary changes.
  ---
   virt/kvm/eventfd.c  | 31 +--
   virt/kvm/kvm_main.c | 16 ++--
   2 files changed, 23 insertions(+), 24 deletions(-)
  
  diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
  index 9ff4193..c3ffdc3 100644
  --- a/virt/kvm/eventfd.c
  +++ b/virt/kvm/eventfd.c
  @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
  _ioeventfd *p)
 return false;
   }
   
  -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
  +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args)
   {
  -  if (flags  KVM_IOEVENTFD_FLAG_PIO)
  +  if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
 return KVM_PIO_BUS;
  -  if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
  +  if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 return KVM_VIRTIO_CCW_NOTIFY_BUS;
  -  return KVM_MMIO_BUS;
  +  /* When length is ignored, MMIO is put on a separate bus, for
  +   * faster lookups.
  +   */
  +  return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
   }
   
   static int
  @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct 
  kvm_ioeventfd *args)
 struct eventfd_ctx   *eventfd;
 int   ret;
   
  -  bus_idx = ioeventfd_bus_from_flags(args-flags);
  +  bus_idx = ioeventfd_bus_from_args(args);
 /* must be 

Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Jason Wang


On 08/25/2015 11:29 PM, Joe Perches wrote:
 On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
  All fields of kvm_io_range were initialized or copied explicitly
  afterwards. So switch to use kmalloc().
 Is there any compiler added alignment padding
 in either structure?  If so, those padding
 areas would now be uninitialized and may leak
 kernel data if copied to user-space.


I get your concern, but I don't a way to copy them to userspace, did you?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang


On 08/25/2015 11:04 AM, Jason Wang wrote:
[...]
 @@ -900,10 +899,11 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct 
 kvm_ioeventfd *args)
   if (!p-wildcard  p-datamatch != args-datamatch)
   continue;
   
  -kvm_io_bus_unregister_dev(kvm, bus_idx, p-dev);
   if (!p-length) {
   kvm_io_bus_unregister_dev(kvm, 
  KVM_FAST_MMIO_BUS,
 p-dev);
  +} else {
  +kvm_io_bus_unregister_dev(kvm, bus_idx, 
  p-dev);
   }
  Similar comments here... do you want to check for bus_idx ==
  KVM_MMIO_BUS as well?
  Good catch. I think keep the original code as is will be also ok to
  solve this. (with changing the bus_idx to KVM_FAST_MMIO_BUS during
  registering if it was an wildcard mmio).
  Do you need to handle the ioeventfd_count changes on the fast mmio bus
  as well?
 Yes. So actually, it needs some changes: checking the return value of
 kvm_io_bus_unregister_dev() and decide which bus does the device belongs to.


Looks like it will be more cleaner by just changing
ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS accordingly. Will
post V2 soon.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Cornelia Huck
On Tue, 25 Aug 2015 15:47:14 +0800
Jason Wang jasow...@redhat.com wrote:

 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 9ff4193..95f2901 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
 _ioeventfd *p)
   return false;
  }
 
 -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
 +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args)

ioeventfd_bus_from_args()? But _from_flags() is not wrong either :)

  {
 - if (flags  KVM_IOEVENTFD_FLAG_PIO)
 + if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
   return KVM_PIO_BUS;
 - if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 + if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
   return KVM_VIRTIO_CCW_NOTIFY_BUS;
 - return KVM_MMIO_BUS;
 + if (args-len)
 + return KVM_MMIO_BUS;
 + return KVM_FAST_MMIO_BUS;

Hm...

/* When length is ignored, MMIO is put on a separate bus, for
 * faster lookups.
 */
return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;

  }
 
  static int

This version of the patch looks nice and compact. Regardless whether
you want to follow my (minor) style suggestions, consider this patch

Acked-by: Cornelia Huck cornelia.h...@de.ibm.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Jason Wang
All fields of kvm_io_range were initialized or copied explicitly
afterwards. So switch to use kmalloc().

Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 virt/kvm/kvm_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..0d79fe8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3248,7 +3248,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus 
bus_idx, gpa_t addr,
if (bus-dev_count - bus-ioeventfd_count  NR_IOBUS_DEVS - 1)
return -ENOSPC;
 
-   new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count + 1) *
+   new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count + 1) *
  sizeof(struct kvm_io_range)), GFP_KERNEL);
if (!new_bus)
return -ENOMEM;
@@ -3280,7 +3280,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum 
kvm_bus bus_idx,
if (r)
return r;
 
-   new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count - 1) *
+   new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count - 1) *
  sizeof(struct kvm_io_range)), GFP_KERNEL);
if (!new_bus)
return -ENOMEM;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 3/3] kvm: add tracepoint for fast mmio

2015-08-25 Thread Jason Wang
Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 arch/x86/kvm/trace.h | 17 +
 arch/x86/kvm/vmx.c   |  1 +
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 19 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 4eae7c3..2d4e81a 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio,
  __entry-count  1 ? (...) : )
 );
 
+TRACE_EVENT(kvm_fast_mmio,
+   TP_PROTO(u64 gpa),
+   TP_ARGS(gpa),
+
+   TP_STRUCT__entry(
+   __field(u64,gpa)
+   ),
+
+   TP_fast_assign(
+   __entry-gpa= gpa;
+   ),
+
+   TP_printk(fast mmio at gpa 0x%llx, __entry-gpa)
+);
+
+
+
 /*
  * Tracepoint for cpuid.
  */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 83b7b5c..a55d279 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
skip_emulated_instruction(vcpu);
+   trace_kvm_fast_mmio(gpa);
return 1;
}
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f0f6ec..36cf78e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
 EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang
We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
and another is KVM_FAST_MMIO_BUS. This leads to issue:

- kvm_io_bus_destroy() knows nothing about the devices on two buses
  points to a single dev. Which will lead double free [1] during exit.
- wildcard eventfd ignores data len, so it was registered as a
  kvm_io_range with zero length. This will fail the binary search in
  kvm_io_bus_get_first_dev() when we try to emulate through
  KVM_MMIO_BUS. This will cause userspace io emulation request instead
  of a eventfd notification (virtqueue kick will be trapped by qemu
  instead of vhost in this case).

Fixing this by don't register wildcard mmio eventfd on two
buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
double free issue of kvm_io_bus_destroy(). For the arch/setups that
does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
KVM_FAST_MMIO_BUS first to see it it has a match.

[1] Panic caused by double free:

CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
RIP: 0010:[c07e25d8]  [c07e25d8] 
ioeventfd_release+0x28/0x60 [kvm]
RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
FS:  7fc1ee3e6700() GS:88023e24() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
Stack:
88021e7cc000  88020e7f3be8 c07e2622
88020e7f3c38 c07df69a 880232524160 88020e792d80
  880219b78c00 0008 8802321686a8
Call Trace:
[c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
[c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
[c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
[811f69f7] __fput+0xe7/0x250
[811f6bae] fput+0xe/0x10
[81093f04] task_work_run+0xd4/0xf0
[81079358] do_exit+0x368/0xa50
[81082c8f] ? recalc_sigpending+0x1f/0x60
[81079ad5] do_group_exit+0x45/0xb0
[81085c71] get_signal+0x291/0x750
[810144d8] do_signal+0x28/0xab0
[810f3a3b] ? do_futex+0xdb/0x5d0
[810b7028] ? __wake_up_locked_key+0x18/0x20
[810f3fa6] ? SyS_futex+0x76/0x170
[81014fc9] do_notify_resume+0x69/0xb0
[817cb9af] int_signal+0x12/0x17
Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 
10 00 00
RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
RSP 88020e7f3bc8

Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
Changes from v1:
- change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
  needed to save lots of unnecessary changes.
---
 virt/kvm/eventfd.c  | 30 --
 virt/kvm/kvm_main.c | 16 ++--
 2 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..95f2901 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
_ioeventfd *p)
return false;
 }
 
-static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
+static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args)
 {
-   if (flags  KVM_IOEVENTFD_FLAG_PIO)
+   if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
return KVM_PIO_BUS;
-   if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
+   if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
return KVM_VIRTIO_CCW_NOTIFY_BUS;
-   return KVM_MMIO_BUS;
+   if (args-len)
+   return KVM_MMIO_BUS;
+   return KVM_FAST_MMIO_BUS;
 }
 
 static int
@@ -779,7 +781,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args)
struct eventfd_ctx   *eventfd;
int   ret;
 
-   bus_idx = ioeventfd_bus_from_flags(args-flags);
+   bus_idx = ioeventfd_bus_from_flags(args);
/* must be natural-word sized, or 0 to ignore length */
switch (args-len) {
case 0:
@@ -843,16 +845,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args)
if (ret  0)
goto unlock_fail;
 
-   /* When length is ignored, MMIO is also put on a separate bus, for
-* faster lookups.
-*/
-   if (!args-len  !(args-flags  

[PATCH v7 16/17] KVM: Warn if 'SN' is set during posting interrupts by software

2015-08-25 Thread Feng Wu
Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot post
interrupts when 'SN' is set.

If the vcpu is in guest mode, it cannot have been scheduled out,
and that's the only case when SN is set currently, warning if
SN is set.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/kvm/vmx.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 64e35ea..eb640a1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4494,6 +4494,22 @@ static inline bool 
kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_SMP
if (vcpu-mode == IN_GUEST_MODE) {
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   /*
+* Currently, we don't support urgent interrupt,
+* all interrupts are recognized as non-urgent
+* interrupt, so we cannot post interrupts when
+* 'SN' is set.
+*
+* If the vcpu is in guest mode, it means it is
+* running instead of being scheduled out and
+* waiting in the run queue, and that's the only
+* case when 'SN' is set currently, warning if
+* 'SN' is set.
+*/
+   WARN_ON_ONCE(pi_test_sn(vmx-pi_desc));
+
apic-send_IPI_mask(get_cpu_mask(vcpu-cpu),
POSTED_INTR_VECTOR);
return true;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] KVM: make halt_poll_ns per-VCPU

2015-08-25 Thread Wanpeng Li
Change halt_poll_ns into per-VCPU variable, seeded from module parameter,
to allow greater flexibility.

Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
---
 include/linux/kvm_host.h | 1 +
 virt/kvm/kvm_main.c  | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 81089cf..1bef9e2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -242,6 +242,7 @@ struct kvm_vcpu {
int sigset_active;
sigset_t sigset;
struct kvm_vcpu_stat stat;
+   unsigned int halt_poll_ns;
 
 #ifdef CONFIG_HAS_IOMEM
int mmio_needed;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d8db2f8f..93db833 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -217,6 +217,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, 
unsigned id)
vcpu-kvm = kvm;
vcpu-vcpu_id = id;
vcpu-pid = NULL;
+   vcpu-halt_poll_ns = halt_poll_ns;
init_waitqueue_head(vcpu-wq);
kvm_async_pf_vcpu_init(vcpu);
 
@@ -1930,8 +1931,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
bool waited = false;
 
start = cur = ktime_get();
-   if (halt_poll_ns) {
-   ktime_t stop = ktime_add_ns(ktime_get(), halt_poll_ns);
+   if (vcpu-halt_poll_ns) {
+   ktime_t stop = ktime_add_ns(ktime_get(), vcpu-halt_poll_ns);
 
do {
/*
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 02/17] KVM: Add some helper functions for Posted-Interrupts

2015-08-25 Thread Feng Wu
This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/kvm/vmx.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 271dd70..316f9bf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -443,6 +443,8 @@ struct nested_vmx {
 };
 
 #define POSTED_INTR_ON  0
+#define POSTED_INTR_SN  1
+
 /* Posted-Interrupt Descriptor */
 struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
@@ -483,6 +485,30 @@ static int pi_test_and_set_pir(int vector, struct pi_desc 
*pi_desc)
return test_and_set_bit(vector, (unsigned long *)pi_desc-pir);
 }
 
+static void pi_clear_sn(struct pi_desc *pi_desc)
+{
+   return clear_bit(POSTED_INTR_SN,
+   (unsigned long *)pi_desc-control);
+}
+
+static void pi_set_sn(struct pi_desc *pi_desc)
+{
+   return set_bit(POSTED_INTR_SN,
+   (unsigned long *)pi_desc-control);
+}
+
+static int pi_test_on(struct pi_desc *pi_desc)
+{
+   return test_bit(POSTED_INTR_ON,
+   (unsigned long *)pi_desc-control);
+}
+
+static int pi_test_sn(struct pi_desc *pi_desc)
+{
+   return test_bit(POSTED_INTR_SN,
+   (unsigned long *)pi_desc-control);
+}
+
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
unsigned long host_rsp;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 05/17] KVM: Add interfaces to control PI outside vmx

2015-08-25 Thread Feng Wu
This patch adds pi_clear_sn and pi_set_sn to struct kvm_x86_ops,
so we can set/clear SN outside vmx.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/vmx.c  | 13 +
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d50c1d3..c4f99f1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -860,6 +860,9 @@ struct kvm_x86_ops {
   gfn_t offset, unsigned long mask);
 
u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
+
+   void (*pi_clear_sn)(struct kvm_vcpu *vcpu);
+   void (*pi_set_sn)(struct kvm_vcpu *vcpu);
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
 };
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 81a995c..234f720 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -615,6 +615,16 @@ struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
return (to_vmx(vcpu)-pi_desc);
 }
 
+static void vmx_pi_clear_sn(struct kvm_vcpu *vcpu)
+{
+   pi_clear_sn(vcpu_to_pi_desc(vcpu));
+}
+
+static void vmx_pi_set_sn(struct kvm_vcpu *vcpu)
+{
+   pi_set_sn(vcpu_to_pi_desc(vcpu));
+}
+
 static unsigned long shadow_read_only_fields[] = {
/*
 * We do NOT shadow fields that are modified when L0
@@ -10471,6 +10481,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
 
.get_pi_desc_addr = vmx_get_pi_desc_addr,
 
+   .pi_clear_sn = vmx_pi_clear_sn,
+   .pi_set_sn = vmx_pi_set_sn,
+
.pmu_ops = intel_pmu_ops,
 };
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 00/17] Add VT-d Posted-Interrupts support

2015-08-25 Thread Feng Wu
VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

v7:
* Define two weak irq bypass callbacks:
  - kvm_arch_irq_bypass_start()
  - kvm_arch_irq_bypass_stop()
* Remove the x86 dummy implementation of the above two functions.
* Print some useful information instead of WARN_ON() when the
  irq bypass consumer unregistration fails.
* Fix an issue when calling pi_pre_block and pi_post_block.

v6:
* Rebase on 4.2.0-rc6
* Rebase on https://lkml.org/lkml/2015/8/6/526 and 
http://www.gossamer-threads.com/lists/linux/kernel/2235623
* Make the add_consumer and del_consumer callbacks static
* Remove pointless INIT_LIST_HEAD to 'vdev-ctx[vector].producer.node)'
* Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
* Remove optional dummy callbacks for irq producer

v4:
* For lowest-priority interrupt, only support single-CPU destination
interrupts at the current stage, more common lowest priority support
will be added later.
* Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
the posted-interrupts in the HLT emulation path.
* Some small changes (coding style, typo, add some code comments)

v3:
* Adjust the Posted-interrupts Descriptor updating logic when vCPU is
  preempted or blocked.
* KVM_DEV_VFIO_DEVICE_POSTING_IRQ -- KVM_DEV_VFIO_DEVICE_POST_IRQ
* __KVM_HAVE_ARCH_KVM_VFIO_POSTING -- __KVM_HAVE_ARCH_KVM_VFIO_POST
* Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
  can be used to change back to remapping mode.
* Fix typo

v2:
* Use VFIO framework to enable this feature, the VFIO part of this series is
  base on Eric's patch [PATCH v3 0/8] KVM-VFIO IRQ forward control
* Rebase this patchset on 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
  then revise some irq logic based on the new hierarchy irqdomain patches 
provided
  by Jiang Liu jiang@linux.intel.com

Feng Wu (17):
  KVM: Extend struct pi_desc for VT-d Posted-Interrupts
  KVM: Add some helper functions for Posted-Interrupts
  KVM: Define a new interface kvm_intr_is_single_vcpu()
  KVM: Get Posted-Interrupts descriptor address from 'struct kvm_vcpu'
  KVM: Add interfaces to control PI outside vmx
  KVM: Make struct kvm_irq_routing_table accessible
  KVM: make kvm_set_msi_irq() public
  vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices
  vfio: Register/unregister irq_bypass_producer
  KVM: x86: Update IRTE for posted-interrupts
  KVM: Define two weak arch callbacks for irq bypass manager
  KVM: Implement IRQ bypass consumer callbacks for x86
  KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
  KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  KVM: Warn if 'SN' is set during posting interrupts by software
  iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

 Documentation/kernel-parameters.txt |   1 +
 arch/x86/include/asm/kvm_host.h |  20 +++
 arch/x86/kvm/Kconfig|   1 +
 arch/x86/kvm/irq_comm.c |  28 +++-
 arch/x86/kvm/vmx.c  | 288 +++-
 arch/x86/kvm/x86.c  | 167 +++--
 drivers/iommu/irq_remapping.c   |  12 +-
 drivers/vfio/pci/Kconfig|   1 +
 drivers/vfio/pci/vfio_pci_intrs.c   |   9 ++
 drivers/vfio/pci/vfio_pci_private.h |   2 +
 include/linux/kvm_host.h|  28 
 include/linux/kvm_irqfd.h   |   2 +
 virt/kvm/eventfd.c  |  22 ++-
 virt/kvm/irqchip.c  |  10 --
 virt/kvm/kvm_main.c |   3 +
 15 files changed, 565 insertions(+), 29 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 06/17] KVM: Make struct kvm_irq_routing_table accessible

2015-08-25 Thread Feng Wu
Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
so we can use it outside of irqchip.c.

Signed-off-by: Feng Wu feng...@intel.com
---
 include/linux/kvm_host.h | 14 ++
 virt/kvm/irqchip.c   | 10 --
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5ac8d21..5f183fb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -328,6 +328,20 @@ struct kvm_kernel_irq_routing_entry {
struct hlist_node link;
 };
 
+#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
+
+struct kvm_irq_routing_table {
+   int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
+   u32 nr_rt_entries;
+   /*
+* Array indexed by gsi. Each entry contains list of irq chips
+* the gsi is connected to.
+*/
+   struct hlist_head map[0];
+};
+
+#endif
+
 #ifndef KVM_PRIVATE_MEM_SLOTS
 #define KVM_PRIVATE_MEM_SLOTS 0
 #endif
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 21c1424..2cf45d3 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -31,16 +31,6 @@
 #include trace/events/kvm.h
 #include irq.h
 
-struct kvm_irq_routing_table {
-   int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
-   u32 nr_rt_entries;
-   /*
-* Array indexed by gsi. Each entry contains list of irq chips
-* the gsi is connected to.
-*/
-   struct hlist_head map[0];
-};
-
 int kvm_irq_map_gsi(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *entries, int gsi)
 {
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 03/17] KVM: Define a new interface kvm_intr_is_single_vcpu()

2015-08-25 Thread Feng Wu
This patch defines a new interface kvm_intr_is_single_vcpu(),
which can returns whether the interrupt is for single-CPU or not.

It is used by VT-d PI, since now we only support single-CPU
interrupts, For lowest-priority interrupts, if user configures
it via /proc/irq or uses irqbalance to make it single-CPU, we
can use PI to deliver the interrupts to it. Full functionality
of lowest-priority support will be added later.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/irq_comm.c | 24 
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 49ec903..af11bca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1204,4 +1204,7 @@ int __x86_set_memory_region(struct kvm *kvm,
 int x86_set_memory_region(struct kvm *kvm,
  const struct kvm_userspace_memory_region *mem);
 
+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+struct kvm_vcpu **dest_vcpu);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 9efff9e..a9572a13 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -297,6 +297,30 @@ out:
return r;
 }
 
+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+struct kvm_vcpu **dest_vcpu)
+{
+   int i, r = 0;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (!kvm_apic_present(vcpu))
+   continue;
+
+   if (!kvm_apic_match_dest(vcpu, NULL, irq-shorthand,
+   irq-dest_id, irq-dest_mode))
+   continue;
+
+   r++;
+   *dest_vcpu = vcpu;
+   }
+
+   if (r == 1)
+   return true;
+   else
+   return false;
+}
+
 #define IOAPIC_ROUTING_ENTRY(irq) \
{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,  \
  .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang


On 08/25/2015 04:20 PM, Cornelia Huck wrote:
 On Tue, 25 Aug 2015 15:47:14 +0800
 Jason Wang jasow...@redhat.com wrote:

 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 9ff4193..95f2901 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
 _ioeventfd *p)
  return false;
  }

 -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
 +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args)
 ioeventfd_bus_from_args()? But _from_flags() is not wrong either :)

  {
 -if (flags  KVM_IOEVENTFD_FLAG_PIO)
 +if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
  return KVM_PIO_BUS;
 -if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 +if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
  return KVM_VIRTIO_CCW_NOTIFY_BUS;
 -return KVM_MMIO_BUS;
 +if (args-len)
 +return KVM_MMIO_BUS;
 +return KVM_FAST_MMIO_BUS;
 Hm...

 /* When length is ignored, MMIO is put on a separate bus, for
  * faster lookups.
  */
 return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;

  }

  static int
 This version of the patch looks nice and compact. Regardless whether
 you want to follow my (minor) style suggestions, consider this patch

 Acked-by: Cornelia Huck cornelia.h...@de.ibm.com


Thanks for the review. V3 posted :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang
We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
and another is KVM_FAST_MMIO_BUS. This leads to issue:

- kvm_io_bus_destroy() knows nothing about the devices on two buses
  points to a single dev. Which will lead double free [1] during exit.
- wildcard eventfd ignores data len, so it was registered as a
  kvm_io_range with zero length. This will fail the binary search in
  kvm_io_bus_get_first_dev() when we try to emulate through
  KVM_MMIO_BUS. This will cause userspace io emulation request instead
  of a eventfd notification (virtqueue kick will be trapped by qemu
  instead of vhost in this case).

Fixing this by don't register wildcard mmio eventfd on two
buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
double free issue of kvm_io_bus_destroy(). For the arch/setups that
does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
KVM_FAST_MMIO_BUS first to see it it has a match.

[1] Panic caused by double free:

CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
RIP: 0010:[c07e25d8]  [c07e25d8] 
ioeventfd_release+0x28/0x60 [kvm]
RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
FS:  7fc1ee3e6700() GS:88023e24() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
Stack:
88021e7cc000  88020e7f3be8 c07e2622
88020e7f3c38 c07df69a 880232524160 88020e792d80
  880219b78c00 0008 8802321686a8
Call Trace:
[c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
[c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
[c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
[811f69f7] __fput+0xe7/0x250
[811f6bae] fput+0xe/0x10
[81093f04] task_work_run+0xd4/0xf0
[81079358] do_exit+0x368/0xa50
[81082c8f] ? recalc_sigpending+0x1f/0x60
[81079ad5] do_group_exit+0x45/0xb0
[81085c71] get_signal+0x291/0x750
[810144d8] do_signal+0x28/0xab0
[810f3a3b] ? do_futex+0xdb/0x5d0
[810b7028] ? __wake_up_locked_key+0x18/0x20
[810f3fa6] ? SyS_futex+0x76/0x170
[81014fc9] do_notify_resume+0x69/0xb0
[817cb9af] int_signal+0x12/0x17
Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 
10 00 00
RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
RSP 88020e7f3bc8

Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
Changes from V2:
- Tweak styles and comment suggested by Cornelia.
Changes from v1:
- change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
  needed to save lots of unnecessary changes.
---
 virt/kvm/eventfd.c  | 31 +--
 virt/kvm/kvm_main.c | 16 ++--
 2 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..c3ffdc3 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
_ioeventfd *p)
return false;
 }
 
-static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
+static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args)
 {
-   if (flags  KVM_IOEVENTFD_FLAG_PIO)
+   if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
return KVM_PIO_BUS;
-   if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
+   if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
return KVM_VIRTIO_CCW_NOTIFY_BUS;
-   return KVM_MMIO_BUS;
+   /* When length is ignored, MMIO is put on a separate bus, for
+* faster lookups.
+*/
+   return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
 }
 
 static int
@@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args)
struct eventfd_ctx   *eventfd;
int   ret;
 
-   bus_idx = ioeventfd_bus_from_flags(args-flags);
+   bus_idx = ioeventfd_bus_from_args(args);
/* must be natural-word sized, or 0 to ignore length */
switch (args-len) {
case 0:
@@ -843,16 +846,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args)
if (ret  0)
goto unlock_fail;
 
-   /* When length 

[PATCH V3 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Jason Wang
All fields of kvm_io_range were initialized or copied explicitly
afterwards. So switch to use kmalloc().

Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 virt/kvm/kvm_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..0d79fe8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3248,7 +3248,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus 
bus_idx, gpa_t addr,
if (bus-dev_count - bus-ioeventfd_count  NR_IOBUS_DEVS - 1)
return -ENOSPC;
 
-   new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count + 1) *
+   new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count + 1) *
  sizeof(struct kvm_io_range)), GFP_KERNEL);
if (!new_bus)
return -ENOMEM;
@@ -3280,7 +3280,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum 
kvm_bus bus_idx,
if (r)
return r;
 
-   new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count - 1) *
+   new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count - 1) *
  sizeof(struct kvm_io_range)), GFP_KERNEL);
if (!new_bus)
return -ENOMEM;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 09/17] vfio: Register/unregister irq_bypass_producer

2015-08-25 Thread Feng Wu
This patch adds the registration/unregistration of an
irq_bypass_producer for MSI/MSIx on vfio pci devices.

v6:
- Make the add_consumer and del_consumer callbacks static
- Remove pointless INIT_LIST_HEAD to 'vdev-ctx[vector].producer.node)'
- Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
- Remove optional dummy callbacks for irq producer

Signed-off-by: Feng Wu feng...@intel.com
---
 drivers/vfio/pci/vfio_pci_intrs.c   | 9 +
 drivers/vfio/pci/vfio_pci_private.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..c65299d 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -319,6 +319,7 @@ static int vfio_msi_set_vector_signal(struct 
vfio_pci_device *vdev,
 
if (vdev-ctx[vector].trigger) {
free_irq(irq, vdev-ctx[vector].trigger);
+   irq_bypass_unregister_producer(vdev-ctx[vector].producer);
kfree(vdev-ctx[vector].name);
eventfd_ctx_put(vdev-ctx[vector].trigger);
vdev-ctx[vector].trigger = NULL;
@@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct 
vfio_pci_device *vdev,
return ret;
}
 
+   vdev-ctx[vector].producer.token = trigger;
+   vdev-ctx[vector].producer.irq = irq;
+   ret = irq_bypass_register_producer(vdev-ctx[vector].producer);
+   if (unlikely(ret))
+   dev_info(pdev-dev,
+   irq bypass producer (token %p) registeration fails: %d\n,
+   vdev-ctx[vector].producer.token, ret);
+
vdev-ctx[vector].trigger = trigger;
 
return 0;
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index ae0e1b4..0e7394f 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -13,6 +13,7 @@
 
 #include linux/mutex.h
 #include linux/pci.h
+#include linux/irqbypass.h
 
 #ifndef VFIO_PCI_PRIVATE_H
 #define VFIO_PCI_PRIVATE_H
@@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx {
struct virqfd   *mask;
char*name;
boolmasked;
+   struct irq_bypass_producer  producer;
 };
 
 struct vfio_pci_device {
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/15] KVM: arm64: introduce ITS emulation file with stub functions

2015-08-25 Thread Andre Przywara
Salut Eric,



 diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
 index 5269ad1..f5865e7 100644
 --- a/virt/kvm/arm/vgic-v3-emul.c
 +++ b/virt/kvm/arm/vgic-v3-emul.c
 @@ -48,6 +48,7 @@
  #include asm/kvm_mmu.h
  
  #include vgic.h
 +#include its-emul.h
  
  static bool handle_mmio_rao_wi(struct kvm_vcpu *vcpu,
 struct kvm_exit_mmio *mmio, phys_addr_t offset)
 @@ -530,9 +531,20 @@ static bool handle_mmio_ctlr_redist(struct kvm_vcpu 
 *vcpu,
  struct kvm_exit_mmio *mmio,
  phys_addr_t offset)
  {
 -/* since we don't support LPIs, this register is zero for now */
 -vgic_reg_access(mmio, NULL, offset,
 -ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
 +struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +u32 reg;
 +
 +if (!vgic_has_its(vcpu-kvm)) {
 +vgic_reg_access(mmio, NULL, offset,
 +ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
 +return false;
 +}
 can't we remove above block and ...
 +reg = dist-lpis_enabled ? GICR_CTLR_ENABLE_LPIS : 0;
 +vgic_reg_access(mmio, reg, offset,
 +ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
 +if (!dist-lpis_enabled  (reg  GICR_CTLR_ENABLE_LPIS
 add vgic_has_its(vcpu-kvm)  above?

Yeah, makes some sense. Changed that.
 
 Besides Reviewed-by: Eric Auger eric.au...@linaro.org

Merci!

André

 
 Eric
 )) {
 +/* Eventually do something */
 +}
  return false;
  }
  
 @@ -861,6 +873,12 @@ static int vgic_v3_map_resources(struct kvm *kvm,
  rdbase += GIC_V3_REDIST_SIZE;
  }
  
 +if (vgic_has_its(kvm)) {
 +ret = vits_init(kvm);
 +if (ret)
 +goto out_unregister;
 +}
 +
  dist-redist_iodevs = iodevs;
  dist-ready = true;
  goto out;

 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 3/3] kvm: add tracepoint for fast mmio

2015-08-25 Thread Jason Wang
Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 arch/x86/kvm/trace.h | 17 +
 arch/x86/kvm/vmx.c   |  1 +
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 19 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 4eae7c3..2d4e81a 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio,
  __entry-count  1 ? (...) : )
 );
 
+TRACE_EVENT(kvm_fast_mmio,
+   TP_PROTO(u64 gpa),
+   TP_ARGS(gpa),
+
+   TP_STRUCT__entry(
+   __field(u64,gpa)
+   ),
+
+   TP_fast_assign(
+   __entry-gpa= gpa;
+   ),
+
+   TP_printk(fast mmio at gpa 0x%llx, __entry-gpa)
+);
+
+
+
 /*
  * Tracepoint for cpuid.
  */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 83b7b5c..a55d279 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
skip_emulated_instruction(vcpu);
+   trace_kvm_fast_mmio(gpa);
return 1;
}
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f0f6ec..36cf78e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
 EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 13/17] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'

2015-08-25 Thread Feng Wu
This patch adds an arch specific hooks 'arch_update' in
'struct kvm_kernel_irqfd'. On Intel side, it is used to
update the IRTE when VT-d posted-interrupts is used.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c  |  5 +
 include/linux/kvm_host.h| 11 +++
 include/linux/kvm_irqfd.h   |  2 ++
 virt/kvm/eventfd.c  | 12 +++-
 5 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3038c1b..22269b4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -176,6 +176,8 @@ enum {
  */
 #define KVM_APIC_PV_EOI_PENDING1
 
+#define __KVM_HAVE_ARCH_IRQFD_INIT 1
+
 struct kvm_kernel_irq_routing_entry;
 
 /*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be4b561..ef93fdc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8355,6 +8355,11 @@ void kvm_arch_irq_bypass_del_producer(struct 
irq_bypass_consumer *cons,
fails: %d\n, irqfd-consumer.token, ret);
 }
 
+void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd)
+{
+   irqfd-arch_update = kvm_arch_update_pi_irte;
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5f183fb..f4005dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -34,6 +34,8 @@
 
 #include asm/kvm_host.h
 
+struct kvm_kernel_irqfd;
+
 /*
  * The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
  * in kvm, other bits are visible for userspace which are defined in
@@ -1145,6 +1147,15 @@ extern struct kvm_device_ops kvm_xics_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
 
+#ifdef __KVM_HAVE_ARCH_IRQFD_INIT
+void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd);
+#else
+static inline void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd)
+{
+   irqfd-arch_update = NULL;
+}
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index 0c1de05..b7aab52 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -66,6 +66,8 @@ struct kvm_kernel_irqfd {
struct work_struct shutdown;
struct irq_bypass_consumer consumer;
struct irq_bypass_producer *producer;
+   int (*arch_update)(struct kvm *kvm, unsigned int host_irq,
+  uint32_t guest_irq, bool set);
 };
 
 #endif /* __LINUX_KVM_IRQFD_H */
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index f3050b9..b2d9066 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -288,6 +288,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
INIT_LIST_HEAD(irqfd-list);
INIT_WORK(irqfd-inject, irqfd_inject);
INIT_WORK(irqfd-shutdown, irqfd_shutdown);
+   kvm_arch_irqfd_init(irqfd);
seqcount_init(irqfd-irq_entry_sc);
 
f = fdget(args-fd);
@@ -580,13 +581,22 @@ kvm_irqfd_release(struct kvm *kvm)
  */
 void kvm_irq_routing_update(struct kvm *kvm)
 {
+   int ret;
struct kvm_kernel_irqfd *irqfd;
 
spin_lock_irq(kvm-irqfds.lock);
 
-   list_for_each_entry(irqfd, kvm-irqfds.items, list)
+   list_for_each_entry(irqfd, kvm-irqfds.items, list) {
irqfd_update(kvm, irqfd);
 
+   if (irqfd-arch_update  irqfd-producer) {
+   ret = irqfd-arch_update(
+   irqfd-kvm, irqfd-producer-irq,
+   irqfd-gsi, 1);
+   WARN_ON(ret);
+   }
+   }
+
spin_unlock_irq(kvm-irqfds.lock);
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 11/17] KVM: Define two weak arch callbacks for irq bypass manager

2015-08-25 Thread Feng Wu
Define two weak arch callbacks so that archs that don't need
them don't need define them.

Signed-off-by: Feng Wu feng...@intel.com
---
 virt/kvm/eventfd.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index d7a230f..f3050b9 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -256,6 +256,16 @@ static void irqfd_update(struct kvm *kvm, struct 
kvm_kernel_irqfd *irqfd)
write_seqcount_end(irqfd-irq_entry_sc);
 }
 
+void __attribute__((weak)) kvm_arch_irq_bypass_stop(
+   struct irq_bypass_consumer *cons)
+{
+}
+
+void __attribute__((weak)) kvm_arch_irq_bypass_start(
+   struct irq_bypass_consumer *cons)
+{
+}
+
 static int
 kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 {
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts

2015-08-25 Thread Feng Wu
This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/kvm/x86.c | 73 ++
 1 file changed, 73 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5ef2560..8f09a76 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -63,6 +63,7 @@
 #include asm/fpu/internal.h /* Ugh! */
 #include asm/pvclock.h
 #include asm/div64.h
+#include asm/irq_remapping.h
 
 #define MAX_IO_MSRS 256
 #define KVM_MAX_MCE_BANKS 32
@@ -8248,6 +8249,78 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
 
+/*
+ * kvm_arch_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success,  0 on failure
+ */
+int kvm_arch_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+   uint32_t guest_irq, bool set)
+{
+   struct kvm_kernel_irq_routing_entry *e;
+   struct kvm_irq_routing_table *irq_rt;
+   struct kvm_lapic_irq irq;
+   struct kvm_vcpu *vcpu;
+   struct vcpu_data vcpu_info;
+   int idx, ret = -EINVAL;
+
+   if (!irq_remapping_cap(IRQ_POSTING_CAP))
+   return 0;
+
+   idx = srcu_read_lock(kvm-irq_srcu);
+   irq_rt = srcu_dereference(kvm-irq_routing, kvm-irq_srcu);
+   BUG_ON(guest_irq = irq_rt-nr_rt_entries);
+
+   hlist_for_each_entry(e, irq_rt-map[guest_irq], link) {
+   if (e-type != KVM_IRQ_ROUTING_MSI)
+   continue;
+   /*
+* VT-d PI cannot support posting multicast/broadcast
+* interrupts to a VCPU, we still use interrupt remapping
+* for these kind of interrupts.
+*
+* For lowest-priority interrupts, we only support
+* those with single CPU as the destination, e.g. user
+* configures the interrupts via /proc/irq or uses
+* irqbalance to make the interrupts single-CPU.
+*
+* We will support full lowest-priority interrupt later.
+*
+*/
+
+   kvm_set_msi_irq(e, irq);
+   if (!kvm_intr_is_single_vcpu(kvm, irq, vcpu))
+   continue;
+
+   vcpu_info.pi_desc_addr = kvm_x86_ops-get_pi_desc_addr(vcpu);
+   vcpu_info.vector = irq.vector;
+
+   if (set)
+   ret = irq_set_vcpu_affinity(host_irq, vcpu_info);
+   else {
+   /* suppress notification event before unposting */
+   kvm_x86_ops-pi_set_sn(vcpu);
+   ret = irq_set_vcpu_affinity(host_irq, NULL);
+   kvm_x86_ops-pi_clear_sn(vcpu);
+   }
+
+   if (ret  0) {
+   printk(KERN_INFO %s: failed to update PI IRTE\n,
+   __func__);
+   goto out;
+   }
+   }
+
+   ret = 0;
+out:
+   srcu_read_unlock(kvm-irq_srcu, idx);
+   return ret;
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 08/17] vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices

2015-08-25 Thread Feng Wu
Enable irq bypass manager for vfio PCI devices.

Signed-off-by: Feng Wu feng...@intel.com
---
 drivers/vfio/pci/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 579d83b..02912f1 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -2,6 +2,7 @@ config VFIO_PCI
tristate VFIO support for PCI devices
depends on VFIO  PCI  EVENTFD
select VFIO_VIRQFD
+   select IRQ_BYPASS_MANAGER
help
  Support for the PCI VFIO bus driver.  This is required to make
  use of PCI drivers using the VFIO framework.
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 12/17] KVM: Implement IRQ bypass consumer callbacks for x86

2015-08-25 Thread Feng Wu
Implement the following callbacks for x86:

- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop: dummy callback
- kvm_arch_irq_bypass_resume: dummy callback

and set CONFIG_HAVE_KVM_IRQ_BYPASS for x86.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/Kconfig|  1 +
 arch/x86/kvm/x86.c  | 34 ++
 3 files changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 82d0709..3038c1b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -24,6 +24,7 @@
 #include linux/perf_event.h
 #include linux/pvclock_gtod.h
 #include linux/clocksource.h
+#include linux/irqbypass.h
 
 #include asm/pvclock-abi.h
 #include asm/desc.h
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index c951d44..b90776f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -30,6 +30,7 @@ config KVM
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQFD
select IRQ_BYPASS_MANAGER
+   select HAVE_KVM_IRQ_BYPASS
select HAVE_KVM_IRQ_ROUTING
select HAVE_KVM_EVENTFD
select KVM_APIC_ARCHITECTURE
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f09a76..be4b561 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -50,6 +50,8 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/kvm_irqfd.h
+#include linux/irqbypass.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -8321,6 +8323,38 @@ out:
return ret;
 }
 
+int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+   struct kvm_kernel_irqfd *irqfd =
+   container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+   irqfd-producer = prod;
+
+   return kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 1);
+}
+
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+   int ret;
+   struct kvm_kernel_irqfd *irqfd =
+   container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+   irqfd-producer = NULL;
+
+   /*
+* When producer of consumer is unregistered, we change back to
+* remapped mode, so we can re-use the current implementation
+* when the irq is masked/disabed or the consumer side (KVM
+* int this case doesn't want to receive the interrupts.
+   */
+   ret = kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 0);
+   if (ret)
+   printk(KERN_INFO irq bypass consumer (token %p) unregistration
+   fails: %d\n, irqfd-consumer.token, ret);
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 14/17] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted

2015-08-25 Thread Feng Wu
This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/kvm/vmx.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 234f720..9c87064 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -45,6 +45,7 @@
 #include asm/debugreg.h
 #include asm/kexec.h
 #include asm/apic.h
+#include asm/irq_remapping.h
 
 #include trace.h
 #include pmu.h
@@ -2001,10 +2002,60 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int 
cpu)
vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
vmx-loaded_vmcs-cpu = cpu;
}
+
+   if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+   struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+   struct pi_desc old, new;
+   unsigned int dest;
+
+   do {
+   old.control = new.control = pi_desc-control;
+
+   /*
+* If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there
+* are two possible cases:
+* 1. After running 'pi_pre_block', context switch
+*happened. For this case, 'sn' was set in
+*vmx_vcpu_put(), so we need to clear it here.
+* 2. After running 'pi_pre_block', we were blocked,
+*and woken up by some other guy. For this case,
+*we don't need to do anything, 'pi_post_block'
+*will do everything for us. However, we cannot
+*check whether it is case #1 or case #2 here
+*(maybe, not needed), so we also clear sn here,
+*I think it is not a big deal.
+*/
+   if (pi_desc-nv != POSTED_INTR_WAKEUP_VECTOR) {
+   if (vcpu-cpu != cpu) {
+   dest = cpu_physical_id(cpu);
+
+   if (x2apic_enabled())
+   new.ndst = dest;
+   else
+   new.ndst = (dest  8)  0xFF00;
+   }
+
+   /* set 'NV' to 'notification vector' */
+   new.nv = POSTED_INTR_VECTOR;
+   }
+
+   /* Allow posting non-urgent interrupts */
+   new.sn = 0;
+   } while (cmpxchg(pi_desc-control, old.control,
+   new.control) != old.control);
+   }
 }
 
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+   struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+   /* Set SN when the vCPU is preempted */
+   if (vcpu-preempted)
+   pi_set_sn(pi_desc);
+   }
+
__vmx_load_host_state(to_vmx(vcpu));
if (!vmm_exclusive) {
__loaded_vmcs_clear(to_vmx(vcpu)-loaded_vmcs);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 01/17] KVM: Extend struct pi_desc for VT-d Posted-Interrupts

2015-08-25 Thread Feng Wu
Extend struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/kvm/vmx.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 83b7b5c..271dd70 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -446,8 +446,24 @@ struct nested_vmx {
 /* Posted-Interrupt Descriptor */
 struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
-   u32 control;/* bit 0 of control is outstanding notification bit */
-   u32 rsvd[7];
+   union {
+   struct {
+   /* bit 256 - Outstanding Notification */
+   u16 on  : 1,
+   /* bit 257 - Suppress Notification */
+   sn  : 1,
+   /* bit 271:258 - Reserved */
+   rsvd_1  : 14;
+   /* bit 279:272 - Notification Vector */
+   u8  nv;
+   /* bit 287:280 - Reserved */
+   u8  rsvd_2;
+   /* bit 319:288 - Notification Destination */
+   u32 ndst;
+   };
+   u64 control;
+   };
+   u32 rsvd[6];
 } __aligned(64);
 
 static bool pi_test_and_set_on(struct pi_desc *pi_desc)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 17/17] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

2015-08-25 Thread Feng Wu
Enable VT-d Posted-Interrtups and add a command line
parameter for it.

Signed-off-by: Feng Wu feng...@intel.com
---
 Documentation/kernel-parameters.txt |  1 +
 drivers/iommu/irq_remapping.c   | 12 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 1d6f045..52aca36 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1547,6 +1547,7 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
nosid   disable Source ID checking
no_x2apic_optout
BIOS x2APIC opt-out request will be ignored
+   nopost  disable Interrupt Posting
 
iomem=  Disable strict checking of access to MMIO memory
strict  regions from userspace.
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2d99930..d8c3997 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -22,7 +22,7 @@ int irq_remap_broken;
 int disable_sourceid_checking;
 int no_x2apic_optout;
 
-int disable_irq_post = 1;
+int disable_irq_post = 0;
 
 static int disable_irq_remap;
 static struct irq_remap_ops *remap_ops;
@@ -58,14 +58,18 @@ static __init int setup_irqremap(char *str)
return -EINVAL;
 
while (*str) {
-   if (!strncmp(str, on, 2))
+   if (!strncmp(str, on, 2)) {
disable_irq_remap = 0;
-   else if (!strncmp(str, off, 3))
+   disable_irq_post = 0;
+   } else if (!strncmp(str, off, 3)) {
disable_irq_remap = 1;
-   else if (!strncmp(str, nosid, 5))
+   disable_irq_post = 1;
+   } else if (!strncmp(str, nosid, 5))
disable_sourceid_checking = 1;
else if (!strncmp(str, no_x2apic_optout, 16))
no_x2apic_optout = 1;
+   else if (!strncmp(str, nopost, 6))
+   disable_irq_post = 1;
 
str += strcspn(str, ,);
while (*str == ',')
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 15/17] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

2015-08-25 Thread Feng Wu
This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/include/asm/kvm_host.h |   5 ++
 arch/x86/kvm/vmx.c  | 151 
 arch/x86/kvm/x86.c  |  55 ---
 include/linux/kvm_host.h|   3 +
 virt/kvm/kvm_main.c |   3 +
 5 files changed, 207 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 22269b4..32af275 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -554,6 +554,8 @@ struct kvm_vcpu_arch {
 */
bool write_fault_to_shadow_pgtable;
 
+   bool halted;
+
/* set at EPT violation at this point */
unsigned long exit_qualification;
 
@@ -868,6 +870,9 @@ struct kvm_x86_ops {
 
void (*pi_clear_sn)(struct kvm_vcpu *vcpu);
void (*pi_set_sn)(struct kvm_vcpu *vcpu);
+
+   int (*pi_pre_block)(struct kvm_vcpu *vcpu);
+   void (*pi_post_block)(struct kvm_vcpu *vcpu);
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
 };
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9c87064..64e35ea 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -888,6 +888,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
 
+/*
+ * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
+ * can find which vCPU should be waken up.
+ */
+static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
 static unsigned long *vmx_io_bitmap_a;
 static unsigned long *vmx_io_bitmap_b;
 static unsigned long *vmx_msr_bitmap_legacy;
@@ -2981,6 +2988,8 @@ static int hardware_enable(void)
return -EBUSY;
 
INIT_LIST_HEAD(per_cpu(loaded_vmcss_on_cpu, cpu));
+   INIT_LIST_HEAD(per_cpu(blocked_vcpu_on_cpu, cpu));
+   spin_lock_init(per_cpu(blocked_vcpu_on_cpu_lock, cpu));
 
/*
 * Now we can enable the vmclear operation in kdump
@@ -6106,6 +6115,25 @@ static void update_ple_window_actual_max(void)
ple_window_grow, INT_MIN);
 }
 
+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+static void wakeup_handler(void)
+{
+   struct kvm_vcpu *vcpu;
+   int cpu = smp_processor_id();
+
+   spin_lock(per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+   list_for_each_entry(vcpu, per_cpu(blocked_vcpu_on_cpu, cpu),
+   blocked_vcpu_list) {
+   struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+   if (pi_test_on(pi_desc) == 1)
+   kvm_vcpu_kick(vcpu);
+   }
+   spin_unlock(per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+}
+
 static __init int hardware_setup(void)
 {
int r = -ENOMEM, i, msr;
@@ -6290,6 +6318,8 @@ static __init int hardware_setup(void)
kvm_x86_ops-enable_log_dirty_pt_masked = NULL;
}
 
+   kvm_set_posted_intr_wakeup_handler(wakeup_handler);
+
return alloc_kvm_area();
 
 out8:
@@ -10414,6 +10444,124 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm 
*kvm,
kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask);
 }
 
+/*
+ * This routine does the following things for vCPU which is going
+ * to be blocked if VT-d PI is enabled.
+ * - Store the vCPU to the wakeup list, so when interrupts happen
+ *   we can find the right vCPU to wake up.
+ * - Change the Posted-interrupt descriptor as below:
+ *  'NDST' -- vcpu-pre_pcpu
+ *  'NV' -- POSTED_INTR_WAKEUP_VECTOR
+ * - If 'ON' is set during this process, which means at least one
+ *   interrupt is posted for this vCPU, we cannot block it, in
+ *   this case, return 1, otherwise, return 0.
+ *
+ */
+static int vmx_pi_pre_block(struct kvm_vcpu *vcpu)
+{
+   unsigned long flags;
+   unsigned int dest;
+   struct pi_desc old, new;
+   struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+   if (!irq_remapping_cap(IRQ_POSTING_CAP))
+   return 0;
+
+   vcpu-pre_pcpu = vcpu-cpu;
+   spin_lock_irqsave(per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu-pre_pcpu), flags);
+   list_add_tail(vcpu-blocked_vcpu_list,
+ per_cpu(blocked_vcpu_on_cpu,
+ vcpu-pre_pcpu));
+   spin_unlock_irqrestore(per_cpu(blocked_vcpu_on_cpu_lock,
+  vcpu-pre_pcpu), flags);
+
+   do {
+   old.control = new.control = pi_desc-control;
+
+   /*
+* We should not block the vCPU if
+* an interrupt is posted for it.
+

[PATCH v7 04/17] KVM: Get Posted-Interrupts descriptor address from 'struct kvm_vcpu'

2015-08-25 Thread Feng Wu
Define an interface to get PI descriptor address from the vCPU structure.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/vmx.c  | 11 +++
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af11bca..d50c1d3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -858,6 +858,8 @@ struct kvm_x86_ops {
void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
   struct kvm_memory_slot *slot,
   gfn_t offset, unsigned long mask);
+
+   u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
 };
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 316f9bf..81a995c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -610,6 +610,10 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu 
*vcpu)
 #define FIELD64(number, name)  [number] = VMCS12_OFFSET(name), \
[number##_HIGH] = VMCS12_OFFSET(name)+4
 
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+{
+   return (to_vmx(vcpu)-pi_desc);
+}
 
 static unsigned long shadow_read_only_fields[] = {
/*
@@ -4487,6 +4491,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu 
*vcpu)
return;
 }
 
+static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu)
+{
+   return __pa((u64)vcpu_to_pi_desc(vcpu));
+}
+
 /*
  * Set up the vmcs's constant host-state fields, i.e., host-state fields that
  * will not change in the lifetime of the guest.
@@ -10460,6 +10469,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.flush_log_dirty = vmx_flush_log_dirty,
.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
 
+   .get_pi_desc_addr = vmx_get_pi_desc_addr,
+
.pmu_ops = intel_pmu_ops,
 };
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 07/17] KVM: make kvm_set_msi_irq() public

2015-08-25 Thread Feng Wu
Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/include/asm/kvm_host.h | 4 
 arch/x86/kvm/irq_comm.c | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c4f99f1..82d0709 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -175,6 +175,8 @@ enum {
  */
 #define KVM_APIC_PV_EOI_PENDING1
 
+struct kvm_kernel_irq_routing_entry;
+
 /*
  * We don't want allocation failures within the mmu code, so we preallocate
  * enough memory for a single page fault in a cache.
@@ -1212,4 +1214,6 @@ int x86_set_memory_region(struct kvm *kvm,
 bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 struct kvm_vcpu **dest_vcpu);
 
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+struct kvm_lapic_irq *irq);
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index a9572a13..1319c60 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -91,8 +91,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
return r;
 }
 
-static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
-  struct kvm_lapic_irq *irq)
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+struct kvm_lapic_irq *irq)
 {
trace_kvm_msi_set_irq(e-msi.address_lo, e-msi.data);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/15] KVM: arm64: implement basic ITS register handlers

2015-08-25 Thread Andre Przywara
Hi Eric,



 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index 659dd39..b498f06 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -32,10 +32,62 @@
  #include vgic.h
  #include its-emul.h

 +#define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
 +
 +/* The distributor lock is held by the VGIC MMIO handler. */
  static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu,
 struct kvm_exit_mmio *mmio,
 phys_addr_t offset)
  {
 + struct vgic_its *its = vcpu-kvm-arch.vgic.its;
 + u32 reg;
 + bool was_enabled;
 +
 + switch (offset  ~3) {
 + case 0x00:  /* GITS_CTLR */
 + /* We never defer any command execution. */
 + reg = GITS_CTLR_QUIESCENT;
 + if (its-enabled)
 + reg |= GITS_CTLR_ENABLE;
 + was_enabled = its-enabled;
 + vgic_reg_access(mmio, reg, offset  3,
 + ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
 + its-enabled = !!(reg  GITS_CTLR_ENABLE);
 + return !was_enabled  its-enabled;
 + case 0x04:  /* GITS_IIDR */
 + reg = (PRODUCT_ID_KVM  24) | (IMPLEMENTER_ARM  0);
 + vgic_reg_access(mmio, reg, offset  3,
 + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
 + break;
 + case 0x08:  /* GITS_TYPER */
 + /*
 +  * We use linear CPU numbers for redistributor addressing,
 +  * so GITS_TYPER.PTA is 0.
 +  * To avoid memory waste on the guest side, we keep the
 +  * number of IDBits and DevBits low for the time being.
 +  * This could later be made configurable by userland.
 +  * Since we have all collections in linked list, we claim
 +  * that we can hold all of the collection tables in our
 +  * own memory and that the ITT entry size is 1 byte (the
 +  * smallest possible one).
 +  */
 + reg = GITS_TYPER_PLPIS;
 + reg |= 0xff  GITS_TYPER_HWCOLLCNT_SHIFT;
 + reg |= 0x0f  GITS_TYPER_DEVBITS_SHIFT;
 + reg |= 0x0f  GITS_TYPER_IDBITS_SHIFT;
 + vgic_reg_access(mmio, reg, offset  3,
 + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
 + break;
 + case 0x0c:
 + /* The upper 32bits of TYPER are all 0 for the time being.
 +  * Should we need more than 256 collections, we can enable
 +  * some bits in here.
 +  */
 + vgic_reg_access(mmio, NULL, offset  3,
 + ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
 + break;
 + }
 +
   return false;
  }

 @@ -43,20 +95,142 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu 
 *vcpu,
   struct kvm_exit_mmio *mmio,
   phys_addr_t offset)
  {
 + u32 reg = 0;
 + int idreg = (offset  ~3) + GITS_IDREGS_BASE;
 +
 + switch (idreg) {
 + case GITS_PIDR2:
 + reg = GIC_PIDR2_ARCH_GICv3;
 + break;
 + case GITS_PIDR4:
 + /* This is a 64K software visible page */
 + reg = 0x40;
 + break;
 + /* Those are the ID registers for (any) GIC. */
 + case GITS_CIDR0:
 + reg = 0x0d;
 + break;
 + case GITS_CIDR1:
 + reg = 0xf0;
 + break;
 + case GITS_CIDR2:
 + reg = 0x05;
 + break;
 + case GITS_CIDR3:
 + reg = 0xb1;
 + break;
 + }
 + vgic_reg_access(mmio, reg, offset  3,
 + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
   return false;
  }

 +static int vits_handle_command(struct kvm_vcpu *vcpu, u64 *its_cmd)
 +{
 + return -ENODEV;
 +}
 +
  static bool handle_mmio_gits_cbaser(struct kvm_vcpu *vcpu,
   struct kvm_exit_mmio *mmio,
   phys_addr_t offset)
  {
 + struct vgic_its *its = vcpu-kvm-arch.vgic.its;
 + int mode = ACCESS_READ_VALUE;
 +
 + mode |= its-enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE;
 +
 + vgic_handle_base_register(vcpu, mmio, offset, its-cbaser, mode);
 +
 + /* Writing CBASER resets the read pointer. */
 + if (mmio-is_write)
 + its-creadr = 0;
 +
   return false;
  }

 +static int its_cmd_buffer_size(struct kvm *kvm)
 +{
 + struct vgic_its *its = kvm-arch.vgic.its;
 +
 + return ((its-cbaser  0xff) + 1)  12;
 +}
 +
 +static gpa_t its_cmd_buffer_base(struct kvm *kvm)
 +{
 + struct vgic_its *its = kvm-arch.vgic.its;
 +
 + return BASER_BASE_ADDRESS(its-cbaser);
 +}
 +
 +/*
 + * By writing to CWRITER the guest announces new commands to be processed.
 + * Since we cannot read from guest memory inside the ITS 

Re: KVM slow LAMP guest

2015-08-25 Thread Wanpeng Li

On 8/25/15 11:42 PM, Hansa wrote:

On 24-8-2015 1:26, Wanpeng Li wrote:

On 8/24/15 3:18 AM, Hansa wrote:

On 16-7-2015 13:27, Paolo Bonzini wrote:

On 15/07/2015 22:02, C. Bröcker wrote:
What OS is this?  Is it RHEL/CentOS? If so, halt_poll_ns will be 
in 6.7

which will be out in a few days/weeks.

Paolo

OK. As said CentOS 6.6.
But where do I put this parameter?

You can add kvm.halt_poll_ns=50 to the kernel command line.  If
you have the parameter, you have the
/sys/module/kvm/parameters/halt_poll_ns file.

Hi,

I upgraded to the CentOS 6.7 release which came out last month and 
as promised the halt_poll_ns parameter was available.
Last week I tested the availability status every 5 minutes on my 
Wordpress VM's with the halt_poll_ns kernel param set on DOM0. I'm 
pleased to announce that it solves the problem!


How much seconds to load your Wordpress site this time?

Regards,
Wanpeng Li
The average is around 0.4 seconds to load my heaviest site on my 
slowest machine.


Nice!



On the VM server I issued the command below every eleven minutes:

date   curltest-file; _
top -b -n 1 | sed -n '7,12p'  curltest-file; _
curl -o /dev/null -s -wtime_total: %{time_total}\\n 
https://my.domain.com | perl -pe 'BEGIN {use POSIX;} print 
strftime(%Y-%m-%d %H:%M:%S , localtime)'  curltest-file


This gives me the total time for displaying my site on a local 
machine. It also includes a 'top' command to display which processes 
are running at each sample. All is saved in a file called curltest-file.


I found 7 occurrences in my curltest-file of a time_total larger than 
20 seconds. Top however didn't show any significant CPU or IO activity 
at those sampled times. Further investigations shows me that they are 
related to a known (gravatar)  issue in the Wordpress Jetpack plugin. 
I didn't include these samples in the average total.


If you just use halt_poll_ns or both halt_poll_ns and idle=poll in guest?

Regards,
Wanpeng Li



Cheers and good luck tweaking your sites!
Best, Hansa
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM slow LAMP guest

2015-08-25 Thread Hansa

On 26-8-2015 0:33, Wanpeng Li wrote:

On the VM server I issued the command below every eleven minutes:

date   curltest-file; _
top -b -n 1 | sed -n '7,12p'  curltest-file; _
curl -o /dev/null -s -wtime_total: %{time_total}\\n https://my.domain.com | perl -pe 'BEGIN 
{use POSIX;} print strftime(%Y-%m-%d %H:%M:%S , localtime)'  curltest-file

This gives me the total time for displaying my site on a local machine. It also 
includes a 'top' command to display which processes are running at each sample. 
All is saved in a file called curltest-file.

I found 7 occurrences in my curltest-file of a time_total larger than 20 
seconds. Top however didn't show any significant CPU or IO activity at those 
sampled times. Further investigations shows me that they are related to a known 
(gravatar)  issue in the Wordpress Jetpack plugin. I didn't include these 
samples in the average total.


If you just use halt_poll_ns or both halt_poll_ns and idle=poll in guest?


I just use kvm.halt_poll_ns=50
Should I try some different tests?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts

2015-08-25 Thread Alex Williamson
On Tue, 2015-08-25 at 16:50 +0800, Feng Wu wrote:
 This patch adds the routine to update IRTE for posted-interrupts
 when guest changes the interrupt configuration.
 
 Signed-off-by: Feng Wu feng...@intel.com
 ---
  arch/x86/kvm/x86.c | 73 
 ++
  1 file changed, 73 insertions(+)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 5ef2560..8f09a76 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -63,6 +63,7 @@
  #include asm/fpu/internal.h /* Ugh! */
  #include asm/pvclock.h
  #include asm/div64.h
 +#include asm/irq_remapping.h
  
  #define MAX_IO_MSRS 256
  #define KVM_MAX_MCE_BANKS 32
 @@ -8248,6 +8249,78 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
  }
  EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
  
 +/*
 + * kvm_arch_update_pi_irte - set IRTE for Posted-Interrupts
 + *
 + * @kvm: kvm
 + * @host_irq: host irq of the interrupt
 + * @guest_irq: gsi of the interrupt
 + * @set: set or unset PI
 + * returns 0 on success,  0 on failure
 + */
 +int kvm_arch_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
 + uint32_t guest_irq, bool set)
 +{
 + struct kvm_kernel_irq_routing_entry *e;
 + struct kvm_irq_routing_table *irq_rt;
 + struct kvm_lapic_irq irq;
 + struct kvm_vcpu *vcpu;
 + struct vcpu_data vcpu_info;
 + int idx, ret = -EINVAL;
 +
 + if (!irq_remapping_cap(IRQ_POSTING_CAP))
 + return 0;
 +
 + idx = srcu_read_lock(kvm-irq_srcu);
 + irq_rt = srcu_dereference(kvm-irq_routing, kvm-irq_srcu);
 + BUG_ON(guest_irq = irq_rt-nr_rt_entries);
 +
 + hlist_for_each_entry(e, irq_rt-map[guest_irq], link) {
 + if (e-type != KVM_IRQ_ROUTING_MSI)
 + continue;
 + /*
 +  * VT-d PI cannot support posting multicast/broadcast
 +  * interrupts to a VCPU, we still use interrupt remapping
 +  * for these kind of interrupts.
 +  *
 +  * For lowest-priority interrupts, we only support
 +  * those with single CPU as the destination, e.g. user
 +  * configures the interrupts via /proc/irq or uses
 +  * irqbalance to make the interrupts single-CPU.
 +  *
 +  * We will support full lowest-priority interrupt later.
 +  *
 +  */
 +
 + kvm_set_msi_irq(e, irq);
 + if (!kvm_intr_is_single_vcpu(kvm, irq, vcpu))
 + continue;
 +
 + vcpu_info.pi_desc_addr = kvm_x86_ops-get_pi_desc_addr(vcpu);
 + vcpu_info.vector = irq.vector;
 +
 + if (set)
 + ret = irq_set_vcpu_affinity(host_irq, vcpu_info);
 + else {
 + /* suppress notification event before unposting */
 + kvm_x86_ops-pi_set_sn(vcpu);
 + ret = irq_set_vcpu_affinity(host_irq, NULL);
 + kvm_x86_ops-pi_clear_sn(vcpu);
 + }

Can we add trace events so that we have a way to tell when PI is being
enabled/disabled other than performance heuristics?  Thanks,

Alex

 +
 + if (ret  0) {
 + printk(KERN_INFO %s: failed to update PI IRTE\n,
 + __func__);
 + goto out;
 + }
 + }
 +
 + ret = 0;
 +out:
 + srcu_read_unlock(kvm-irq_srcu, idx);
 + return ret;
 +}
 +
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Cornelia Huck
On Tue, 25 Aug 2015 17:05:47 +0800
Jason Wang jasow...@redhat.com wrote:

 We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
 and another is KVM_FAST_MMIO_BUS. This leads to issue:
 
 - kvm_io_bus_destroy() knows nothing about the devices on two buses
   points to a single dev. Which will lead double free [1] during exit.
 - wildcard eventfd ignores data len, so it was registered as a
   kvm_io_range with zero length. This will fail the binary search in
   kvm_io_bus_get_first_dev() when we try to emulate through
   KVM_MMIO_BUS. This will cause userspace io emulation request instead
   of a eventfd notification (virtqueue kick will be trapped by qemu
   instead of vhost in this case).
 
 Fixing this by don't register wildcard mmio eventfd on two
 buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
 double free issue of kvm_io_bus_destroy(). For the arch/setups that
 does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
 KVM_FAST_MMIO_BUS first to see it it has a match.
 
 [1] Panic caused by double free:
 
 CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
 #28-Ubuntu
 Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
 RIP: 0010:[c07e25d8]  [c07e25d8] 
 ioeventfd_release+0x28/0x60 [kvm]
 RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
 RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
 R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
 FS:  7fc1ee3e6700() GS:88023e24() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
 Stack:
 88021e7cc000  88020e7f3be8 c07e2622
 88020e7f3c38 c07df69a 880232524160 88020e792d80
   880219b78c00 0008 8802321686a8
 Call Trace:
 [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
 [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
 [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
 [811f69f7] __fput+0xe7/0x250
 [811f6bae] fput+0xe/0x10
 [81093f04] task_work_run+0xd4/0xf0
 [81079358] do_exit+0x368/0xa50
 [81082c8f] ? recalc_sigpending+0x1f/0x60
 [81079ad5] do_group_exit+0x45/0xb0
 [81085c71] get_signal+0x291/0x750
 [810144d8] do_signal+0x28/0xab0
 [810f3a3b] ? do_futex+0xdb/0x5d0
 [810b7028] ? __wake_up_locked_key+0x18/0x20
 [810f3fa6] ? SyS_futex+0x76/0x170
 [81014fc9] do_notify_resume+0x69/0xb0
 [817cb9af] int_signal+0x12/0x17
 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 
 01 10 00 00
 RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
 RSP 88020e7f3bc8
 
 Cc: Gleb Natapov g...@kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
 Changes from V2:
 - Tweak styles and comment suggested by Cornelia.
 Changes from v1:
 - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
   needed to save lots of unnecessary changes.
 ---
  virt/kvm/eventfd.c  | 31 +--
  virt/kvm/kvm_main.c | 16 ++--
  2 files changed, 23 insertions(+), 24 deletions(-)

Acked-by: Cornelia Huck cornelia.h...@de.ibm.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 3/3] kvm: add tracepoint for fast mmio

2015-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2015 at 05:05:48PM +0800, Jason Wang wrote:
 Cc: Gleb Natapov g...@kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  arch/x86/kvm/trace.h | 17 +
  arch/x86/kvm/vmx.c   |  1 +
  arch/x86/kvm/x86.c   |  1 +
  3 files changed, 19 insertions(+)
 
 diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
 index 4eae7c3..2d4e81a 100644
 --- a/arch/x86/kvm/trace.h
 +++ b/arch/x86/kvm/trace.h
 @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio,
 __entry-count  1 ? (...) : )
  );
  
 +TRACE_EVENT(kvm_fast_mmio,
 + TP_PROTO(u64 gpa),
 + TP_ARGS(gpa),
 +
 + TP_STRUCT__entry(
 + __field(u64,gpa)
 + ),
 +
 + TP_fast_assign(
 + __entry-gpa= gpa;
 + ),
 +
 + TP_printk(fast mmio at gpa 0x%llx, __entry-gpa)
 +);
 +
 +
 +

don't add multiple empty lines pls.

  /*
   * Tracepoint for cpuid.
   */
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 83b7b5c..a55d279 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
   if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
   skip_emulated_instruction(vcpu);
 + trace_kvm_fast_mmio(gpa);
   return 1;
   }
  
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 8f0f6ec..36cf78e 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
  EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
  
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
 -- 
 2.1.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote:
 We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
 and another is KVM_FAST_MMIO_BUS. This leads to issue:
 
 - kvm_io_bus_destroy() knows nothing about the devices on two buses
   points to a single dev. Which will lead double free [1] during exit.
 - wildcard eventfd ignores data len, so it was registered as a
   kvm_io_range with zero length. This will fail the binary search in
   kvm_io_bus_get_first_dev() when we try to emulate through
   KVM_MMIO_BUS. This will cause userspace io emulation request instead
   of a eventfd notification (virtqueue kick will be trapped by qemu
   instead of vhost in this case).
 
 Fixing this by don't register wildcard mmio eventfd on two
 buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
 double free issue of kvm_io_bus_destroy(). For the arch/setups that
 does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
 KVM_FAST_MMIO_BUS first to see it it has a match.
 
 [1] Panic caused by double free:
 
 CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
 #28-Ubuntu
 Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
 RIP: 0010:[c07e25d8]  [c07e25d8] 
 ioeventfd_release+0x28/0x60 [kvm]
 RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
 RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
 R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
 FS:  7fc1ee3e6700() GS:88023e24() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
 Stack:
 88021e7cc000  88020e7f3be8 c07e2622
 88020e7f3c38 c07df69a 880232524160 88020e792d80
   880219b78c00 0008 8802321686a8
 Call Trace:
 [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
 [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
 [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
 [811f69f7] __fput+0xe7/0x250
 [811f6bae] fput+0xe/0x10
 [81093f04] task_work_run+0xd4/0xf0
 [81079358] do_exit+0x368/0xa50
 [81082c8f] ? recalc_sigpending+0x1f/0x60
 [81079ad5] do_group_exit+0x45/0xb0
 [81085c71] get_signal+0x291/0x750
 [810144d8] do_signal+0x28/0xab0
 [810f3a3b] ? do_futex+0xdb/0x5d0
 [810b7028] ? __wake_up_locked_key+0x18/0x20
 [810f3fa6] ? SyS_futex+0x76/0x170
 [81014fc9] do_notify_resume+0x69/0xb0
 [817cb9af] int_signal+0x12/0x17
 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 
 01 10 00 00
 RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
 RSP 88020e7f3bc8
 
 Cc: Gleb Natapov g...@kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
 Changes from V2:
 - Tweak styles and comment suggested by Cornelia.
 Changes from v1:
 - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
   needed to save lots of unnecessary changes.
 ---
  virt/kvm/eventfd.c  | 31 +--
  virt/kvm/kvm_main.c | 16 ++--
  2 files changed, 23 insertions(+), 24 deletions(-)
 
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 9ff4193..c3ffdc3 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
 _ioeventfd *p)
   return false;
  }
  
 -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
 +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args)
  {
 - if (flags  KVM_IOEVENTFD_FLAG_PIO)
 + if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
   return KVM_PIO_BUS;
 - if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 + if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
   return KVM_VIRTIO_CCW_NOTIFY_BUS;
 - return KVM_MMIO_BUS;
 + /* When length is ignored, MMIO is put on a separate bus, for
 +  * faster lookups.
 +  */
 + return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
  }
  
  static int
 @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct 
 kvm_ioeventfd *args)
   struct eventfd_ctx   *eventfd;
   int   ret;
  
 - bus_idx = ioeventfd_bus_from_flags(args-flags);
 + bus_idx = ioeventfd_bus_from_args(args);
   /* must be natural-word sized, or 0 to ignore length */
   switch (args-len) {
   case 0:
 @@ -843,16 +846,6 @@ 

[PATCH v3 1/4] irqchip: GICv3: Convert to EOImode == 1

2015-08-25 Thread Marc Zyngier
So far, GICv3 has been used in with EOImode == 0. The effect of this
mode is to perform the priority drop and the deactivation of the
interrupt at the same time.

While this works perfectly for Linux (we only have a single priority),
it causes issues when an interrupt is forwarded to a guest, and when
we want the guest to perform the EOI itself.

For this case, the GIC architecture provides EOImode == 1, where:
- A write to ICC_EOIR1_EL1 drops the priority of the interrupt and
  leaves it active. Other interrupts at the same priority level can
  now be taken, but the active interrupt cannot be taken again
- A write to ICC_DIR_EL1 marks the interrupt as inactive, meaning
  it can now be taken again.

This patch converts the driver to be able to use this new mode,
depending on whether or not the kernel can behave as a hypervisor.
No feature change.

Reviewed-by: Eric Auger eric.au...@linaro.org
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 drivers/irqchip/irq-gic-v3.c   | 39 ++
 include/linux/irqchip/arm-gic-v3.h |  9 +
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index c52f7ba..addd2ee 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -30,6 +30,7 @@
 #include asm/cputype.h
 #include asm/exception.h
 #include asm/smp_plat.h
+#include asm/virt.h
 
 #include irq-gic-common.h
 #include irqchip.h
@@ -50,6 +51,7 @@ struct gic_chip_data {
 };
 
 static struct gic_chip_data gic_data __read_mostly;
+static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
 
 #define gic_data_rdist()   (this_cpu_ptr(gic_data.rdists.rdist))
 #define gic_data_rdist_rd_base()   (gic_data_rdist()-rd_base)
@@ -293,7 +295,14 @@ static int gic_irq_get_irqchip_state(struct irq_data *d,
 
 static void gic_eoi_irq(struct irq_data *d)
 {
-   gic_write_eoir(gic_irq(d));
+   if (static_key_true(supports_deactivate)) {
+   /* No need to deactivate an LPI */
+   if (gic_irq(d) = 8192)
+   return;
+   gic_write_dir(gic_irq(d));
+   } else {
+   gic_write_eoir(gic_irq(d));
+   }
 }
 
 static int gic_set_type(struct irq_data *d, unsigned int type)
@@ -343,15 +352,26 @@ static asmlinkage void __exception_irq_entry 
gic_handle_irq(struct pt_regs *regs
 
if (likely(irqnr  15  irqnr  1020) || irqnr = 8192) {
int err;
+
+   if (static_key_true(supports_deactivate))
+   gic_write_eoir(irqnr);
+
err = handle_domain_irq(gic_data.domain, irqnr, regs);
if (err) {
WARN_ONCE(true, Unexpected interrupt 
received!\n);
-   gic_write_eoir(irqnr);
+   if (static_key_true(supports_deactivate)) {
+   if (irqnr  8192)
+   gic_write_dir(irqnr);
+   } else {
+   gic_write_eoir(irqnr);
+   }
}
continue;
}
if (irqnr  16) {
gic_write_eoir(irqnr);
+   if (static_key_true(supports_deactivate))
+   gic_write_dir(irqnr);
 #ifdef CONFIG_SMP
handle_IPI(irqnr, regs);
 #else
@@ -451,8 +471,13 @@ static void gic_cpu_sys_reg_init(void)
/* Set priority mask register */
gic_write_pmr(DEFAULT_PMR_VALUE);
 
-   /* EOI deactivates interrupt too (mode 0) */
-   gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir);
+   if (static_key_true(supports_deactivate)) {
+   /* EOI drops priority only (mode 1) */
+   gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop);
+   } else {
+   /* EOI deactivates interrupt too (mode 0) */
+   gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir);
+   }
 
/* ... and let's hit the road... */
gic_write_grpen1(1);
@@ -820,6 +845,12 @@ static int __init gic_of_init(struct device_node *node, 
struct device_node *pare
if (of_property_read_u64(node, redistributor-stride, redist_stride))
redist_stride = 0;
 
+   if (!is_hyp_mode_available())
+   static_key_slow_dec(supports_deactivate);
+
+   if (static_key_true(supports_deactivate))
+   pr_info(GIC: Using split EOI/Deactivate mode\n);
+
gic_data.dist_base = dist_base;
gic_data.redist_regions = rdist_regs;
gic_data.nr_redist_regions = nr_redist_regions;
diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index ffbc034..bc98832 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ 

[PATCH v3 3/4] irqchip: GIC: Convert to EOImode == 1

2015-08-25 Thread Marc Zyngier
So far, GICv2 has been used with EOImode == 0. The effect of this
mode is to perform the priority drop and the deactivation of the
interrupt at the same time.

While this works perfectly for Linux (we only have a single priority),
it causes issues when an interrupt is forwarded to a guest, and when
we want the guest to perform the EOI itself.

For this case, the GIC architecture provides EOImode == 1, where:
- A write to the EOI register drops the priority of the interrupt
  and leaves it active. Other interrupts at the same priority level
  can now be taken, but the active interrupt cannot be taken again
- A write to the DIR marks the interrupt as inactive, meaning it can
  now be taken again.

We only enable this feature when booted in HYP mode and that
the device-tree reported a suitable CPU interface. Observable behaviour
should remain unchanged.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 drivers/irqchip/irq-gic.c   | 51 +++--
 include/linux/irqchip/arm-gic.h |  4 
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 4dd8826..505aaf3 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -46,6 +46,7 @@
 #include asm/irq.h
 #include asm/exception.h
 #include asm/smp_plat.h
+#include asm/virt.h
 
 #include irq-gic-common.h
 #include irqchip.h
@@ -82,6 +83,8 @@ static DEFINE_RAW_SPINLOCK(irq_controller_lock);
 #define NR_GIC_CPU_IF 8
 static u8 gic_cpu_map[NR_GIC_CPU_IF] __read_mostly;
 
+static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
+
 #ifndef MAX_GIC_NR
 #define MAX_GIC_NR 1
 #endif
@@ -137,6 +140,14 @@ static inline unsigned int gic_irq(struct irq_data *d)
return d-hwirq;
 }
 
+static inline bool primary_gic_irq(struct irq_data *d)
+{
+   if (MAX_GIC_NR  1)
+   return irq_data_get_irq_chip_data(d) == gic_data[0];
+
+   return true;
+}
+
 /*
  * Routines to acknowledge, disable and enable interrupts
  */
@@ -164,7 +175,14 @@ static void gic_unmask_irq(struct irq_data *d)
 
 static void gic_eoi_irq(struct irq_data *d)
 {
-   writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI);
+   u32 deact_offset = GIC_CPU_EOI;
+
+   if (static_key_true(supports_deactivate)) {
+   if (primary_gic_irq(d))
+   deact_offset = GIC_CPU_DEACTIVATE;
+   }
+
+   writel_relaxed(gic_irq(d), gic_cpu_base(d) + deact_offset);
 }
 
 static int gic_irq_set_irqchip_state(struct irq_data *d,
@@ -272,11 +290,15 @@ static void __exception_irq_entry gic_handle_irq(struct 
pt_regs *regs)
irqnr = irqstat  GICC_IAR_INT_ID_MASK;
 
if (likely(irqnr  15  irqnr  1021)) {
+   if (static_key_true(supports_deactivate))
+   writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
handle_domain_irq(gic-domain, irqnr, regs);
continue;
}
if (irqnr  16) {
writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
+   if (static_key_true(supports_deactivate))
+   writel_relaxed(irqstat, cpu_base + 
GIC_CPU_DEACTIVATE);
 #ifdef CONFIG_SMP
handle_IPI(irqnr, regs);
 #endif
@@ -359,6 +381,10 @@ static void gic_cpu_if_up(void)
 {
void __iomem *cpu_base = gic_data_cpu_base(gic_data[0]);
u32 bypass = 0;
+   u32 mode = 0;
+
+   if (static_key_true(supports_deactivate))
+   mode = GIC_CPU_CTRL_EOImodeNS;
 
/*
* Preserve bypass disable bits to be written back later
@@ -366,7 +392,7 @@ static void gic_cpu_if_up(void)
bypass = readl(cpu_base + GIC_CPU_CTRL);
bypass = GICC_DIS_BYPASS_MASK;
 
-   writel_relaxed(bypass | GICC_ENABLE, cpu_base + GIC_CPU_CTRL);
+   writel_relaxed(bypass | mode | GICC_ENABLE, cpu_base + GIC_CPU_CTRL);
 }
 
 
@@ -986,6 +1012,8 @@ void __init gic_init_bases(unsigned int gic_nr, int 
irq_start,
register_cpu_notifier(gic_cpu_notifier);
 #endif
set_handle_irq(gic_handle_irq);
+   if (static_key_true(supports_deactivate))
+   pr_info(GIC: Using split EOI/Deactivate mode\n);
}
 
gic_dist_init(gic);
@@ -1001,6 +1029,7 @@ gic_of_init(struct device_node *node, struct device_node 
*parent)
 {
void __iomem *cpu_base;
void __iomem *dist_base;
+   struct resource cpu_res;
u32 percpu_offset;
int irq;
 
@@ -1013,6 +1042,16 @@ gic_of_init(struct device_node *node, struct device_node 
*parent)
cpu_base = of_iomap(node, 1);
WARN(!cpu_base, unable to map gic cpu registers\n);
 
+   of_address_to_resource(node, 1, cpu_res);
+
+   /*
+* Disable split EOI/Deactivate if either HYP is not available
+* or the CPU interface is too small.
+*/

Re: [PATCH v2 10/15] KVM: arm64: add data structures to model ITS interrupt translation

2015-08-25 Thread Andre Przywara
Hi Eric,

On 13/08/15 16:46, Eric Auger wrote:
 
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 The GICv3 Interrupt Translation Service (ITS) uses tables in memory
 to allow a sophisticated interrupt routing. It features device tables,
 an interrupt table per device and a table connecting collections to
 actual CPUs (aka. redistributors in the GICv3 lingo).
 Since the interrupt numbers for the LPIs are allocated quite sparsely
 and the range can be quite huge (8192 LPIs being the minimum), using
 bitmaps or arrays for storing information is a waste of memory.
 We use linked lists instead, which we iterate linearily. This works
 very well with the actual number of LPIs/MSIs in the guest being
 quite low. Should the number of LPIs exceed the number where iterating
 through lists seems acceptable, we can later revisit this and use more
 efficient data structures.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h  |  3 +++
  virt/kvm/arm/its-emul.c | 48 
 
  2 files changed, 51 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index b432055..1648668 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -25,6 +25,7 @@
  #include linux/spinlock.h
  #include linux/types.h
  #include kvm/iodev.h
 +#include linux/list.h
  
  #define VGIC_NR_IRQS_LEGACY 256
  #define VGIC_NR_SGIS16
 @@ -162,6 +163,8 @@ struct vgic_its {
  u64 cbaser;
  int creadr;
  int cwriter;
 +struct list_headdevice_list;
 +struct list_headcollection_list;
  };
  
  struct vgic_dist {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index b498f06..7f217fa 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -21,6 +21,7 @@
  #include linux/kvm.h
  #include linux/kvm_host.h
  #include linux/interrupt.h
 +#include linux/list.h
  
  #include linux/irqchip/arm-gic-v3.h
  #include kvm/arm_vgic.h
 @@ -32,6 +33,25 @@
  #include vgic.h
  #include its-emul.h
  
 +struct its_device {
 +struct list_head dev_list;
 +struct list_head itt;
 +u32 device_id;
 +};
 +
 +struct its_collection {
 +struct list_head coll_list;
 +u32 collection_id;
 +u32 target_addr;
 +};
 +
 +struct its_itte {
 +struct list_head itte_list;
 +struct its_collection *collection;
 +u32 lpi;
 +u32 event_id;
 +};
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
  
  /* The distributor lock is held by the VGIC MMIO handler. */
 @@ -311,6 +331,9 @@ int vits_init(struct kvm *kvm)
  
  spin_lock_init(its-lock);
  
 +INIT_LIST_HEAD(its-device_list);
 +INIT_LIST_HEAD(its-collection_list);
 +
  its-enabled = false;
  
  return -ENXIO;
 @@ -320,11 +343,36 @@ void vits_destroy(struct kvm *kvm)
  {
  struct vgic_dist *dist = kvm-arch.vgic;
  struct vgic_its *its = dist-its;
 +struct its_device *dev;
 +struct its_itte *itte;
 +struct list_head *dev_cur, *dev_temp;
 +struct list_head *cur, *temp;
  
  if (!vgic_has_its(kvm))
  return;
  
 +if (!its-device_list.next)
 Why not using list_empty? But I think I would simply remove this since
 the empty case if handle below...

list_empty() requires the list to be initialized before. This check here
is to detect that map_resources was never called (this is only done on
the first VCPU run) and thus device_list is basically still all zeroes.
If we abort the guest without ever running a VCPU (for instance because
some initialization failed), we call vits_destroy() anyway (because this
is called when tearing down the VGIC device).
So the check is here to detect early that vits_destroy() has been called
without the ITS ever been fully initialized. This fixed a real bug when
the guest start was aborted before the ITS was ever used.
I will add a comment to make this clear.

 +return;
 +
 +spin_lock(its-lock);
 +list_for_each_safe(dev_cur, dev_temp, its-device_list) {
 +dev = container_of(dev_cur, struct its_device, dev_list);
 isn't the usage of list_for_each_entry_safe more synthetic here?

If I got this correctly, we need the _safe variant if we want to remove
the list item within the loop. Or am I missing something here?

Cheers,
Andre.


 +list_for_each_safe(cur, temp, dev-itt) {
 +itte = (container_of(cur, struct its_itte, itte_list));
 same
 
 Eric
 +list_del(cur);
 +kfree(itte);
 +}
 +list_del(dev_cur);
 +kfree(dev);
 +}
 +
 +list_for_each_safe(cur, temp, its-collection_list) {
 +list_del(cur);
 +kfree(container_of(cur, struct its_collection, coll_list));
 +}
 +
  kfree(dist-pendbaser);
  
  its-enabled = false;
 +spin_unlock(its-lock);
  }

 
--
To unsubscribe from this list: send the 

Re: [PATCH V2 3/3] kvm: add tracepoint for fast mmio

2015-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2015 at 03:47:15PM +0800, Jason Wang wrote:
 Cc: Gleb Natapov g...@kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  arch/x86/kvm/trace.h | 17 +
  arch/x86/kvm/vmx.c   |  1 +
  arch/x86/kvm/x86.c   |  1 +
  3 files changed, 19 insertions(+)
 
 diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
 index 4eae7c3..2d4e81a 100644
 --- a/arch/x86/kvm/trace.h
 +++ b/arch/x86/kvm/trace.h
 @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio,
 __entry-count  1 ? (...) : )
  );
  
 +TRACE_EVENT(kvm_fast_mmio,
 + TP_PROTO(u64 gpa),
 + TP_ARGS(gpa),
 +
 + TP_STRUCT__entry(
 + __field(u64,gpa)
 + ),
 +
 + TP_fast_assign(
 + __entry-gpa= gpa;
 + ),
 +
 + TP_printk(fast mmio at gpa 0x%llx, __entry-gpa)
 +);
 +
 +
 +

don't add multiple empty lines please.

  /*
   * Tracepoint for cpuid.
   */
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 83b7b5c..a55d279 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
   if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
   skip_emulated_instruction(vcpu);
 + trace_kvm_fast_mmio(gpa);
   return 1;
   }
  
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 8f0f6ec..36cf78e 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
  EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
  
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
 -- 
 2.1.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote:
 We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
 and another is KVM_FAST_MMIO_BUS. This leads to issue:
 
 - kvm_io_bus_destroy() knows nothing about the devices on two buses
   points to a single dev. Which will lead double free [1] during exit.
 - wildcard eventfd ignores data len, so it was registered as a
   kvm_io_range with zero length. This will fail the binary search in
   kvm_io_bus_get_first_dev() when we try to emulate through
   KVM_MMIO_BUS. This will cause userspace io emulation request instead
   of a eventfd notification (virtqueue kick will be trapped by qemu
   instead of vhost in this case).
 
 Fixing this by don't register wildcard mmio eventfd on two
 buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
 double free issue of kvm_io_bus_destroy(). For the arch/setups that
 does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
 KVM_FAST_MMIO_BUS first to see it it has a match.
 
 [1] Panic caused by double free:
 
 CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
 #28-Ubuntu
 Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
 RIP: 0010:[c07e25d8]  [c07e25d8] 
 ioeventfd_release+0x28/0x60 [kvm]
 RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
 RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
 R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
 FS:  7fc1ee3e6700() GS:88023e24() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
 Stack:
 88021e7cc000  88020e7f3be8 c07e2622
 88020e7f3c38 c07df69a 880232524160 88020e792d80
   880219b78c00 0008 8802321686a8
 Call Trace:
 [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
 [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
 [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
 [811f69f7] __fput+0xe7/0x250
 [811f6bae] fput+0xe/0x10
 [81093f04] task_work_run+0xd4/0xf0
 [81079358] do_exit+0x368/0xa50
 [81082c8f] ? recalc_sigpending+0x1f/0x60
 [81079ad5] do_group_exit+0x45/0xb0
 [81085c71] get_signal+0x291/0x750
 [810144d8] do_signal+0x28/0xab0
 [810f3a3b] ? do_futex+0xdb/0x5d0
 [810b7028] ? __wake_up_locked_key+0x18/0x20
 [810f3fa6] ? SyS_futex+0x76/0x170
 [81014fc9] do_notify_resume+0x69/0xb0
 [817cb9af] int_signal+0x12/0x17
 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 
 01 10 00 00
 RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
 RSP 88020e7f3bc8
 
 Cc: Gleb Natapov g...@kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com

Saw v3 too late. Pls see my comments on v2.

 ---
 Changes from V2:
 - Tweak styles and comment suggested by Cornelia.
 Changes from v1:
 - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
   needed to save lots of unnecessary changes.
 ---
  virt/kvm/eventfd.c  | 31 +--
  virt/kvm/kvm_main.c | 16 ++--
  2 files changed, 23 insertions(+), 24 deletions(-)
 
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 9ff4193..c3ffdc3 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
 _ioeventfd *p)
   return false;
  }
  
 -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
 +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args)
  {
 - if (flags  KVM_IOEVENTFD_FLAG_PIO)
 + if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
   return KVM_PIO_BUS;
 - if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 + if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
   return KVM_VIRTIO_CCW_NOTIFY_BUS;
 - return KVM_MMIO_BUS;
 + /* When length is ignored, MMIO is put on a separate bus, for
 +  * faster lookups.
 +  */
 + return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
  }
  
  static int
 @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct 
 kvm_ioeventfd *args)
   struct eventfd_ctx   *eventfd;
   int   ret;
  
 - bus_idx = ioeventfd_bus_from_flags(args-flags);
 + bus_idx = ioeventfd_bus_from_args(args);
   /* must be natural-word sized, or 0 to ignore length */
   switch (args-len) {
  

[PATCH v3 0/4] irqchip: GICv2/v3: Add support for irq_vcpu_affinity

2015-08-25 Thread Marc Zyngier
The GICv2 and GICv3 architectures allow an active physical interrupt
to be forwarded to a guest, and the guest to indirectly perform the
deactivation of the interrupt by performing an EOI on the virtual
interrupt (see for example the GICv2 spec, 3.2.1).

This allows some substantial performance improvement for level
triggered interrupts that otherwise have to be masked/unmasked in
VFIO, not to mention the required trap back to KVM when the guest
performs an EOI.

To enable this, the GICs need to be switched to a different EOImode,
where a taken interrupt can be left active (which prevents the same
interrupt from being taken again), while other interrupts are still
being processed normally.

We also use the new irq_set_vcpu_affinity hook that was introduced for
Intel's Posted Interrupts to determine whether or not to perform the
deactivation at EOI-time.

As all of this only makes sense when the kernel can behave as a
hypervisor, we only enable this mode on detecting that the kernel was
actually booted in HYP mode, and that the GIC supports this feature.

This series is a complete rework of a RFC I sent over a year ago:

http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/266328.html

Since then, a lot has been either merged (the irqchip_state) or reworked
(my active-timer series: http://www.spinics.net/lists/kvm/msg118768.html),
and this implements the last few bits for Eric Auger's series to
finally make it into the kernel:

https://lkml.org/lkml/2015/7/2/268
https://lkml.org/lkml/2015/7/6/291

With all these patches combined, physical interrupt routing from the
kernel into a VM becomes possible.

This has been tested on Juno (GICv2) and FastModel (GICv3), and Eric
tested it on a Calxeda Midway. A branch is available at:

git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git 
irq/gic-irq-vcpu-affinity-v2

* From v2:
  - Another small fix from Eric
  - Some commit message cleanups

* From v1:
  - Fixes after review from Eric
  - Got rid of the cascaded GICv2 hack (it was broken anyway)
  - Folded the LPI deactivation patch (it makes more sense as part of
the main one.
  - Some clarifying comments about the deactivate on mask
  - I haven't retained Eric's Reviewed/Tested-by, as the code as
significantly changed on GICv2

Marc Zyngier (4):
  irqchip: GICv3: Convert to EOImode == 1
  irqchip: GICv3: Don't deactivate interrupts forwarded to a guest
  irqchip: GIC: Convert to EOImode == 1
  irqchip: GIC: Don't deactivate interrupts forwarded to a guest

 drivers/irqchip/irq-gic-v3.c   |  70 +--
 drivers/irqchip/irq-gic.c  | 111 -
 include/linux/irqchip/arm-gic-v3.h |   9 +++
 include/linux/irqchip/arm-gic.h|   4 ++
 4 files changed, 188 insertions(+), 6 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/4] irqchip: GIC: Don't deactivate interrupts forwarded to a guest

2015-08-25 Thread Marc Zyngier
Commit 0a4377de3056 (genirq: Introduce irq_set_vcpu_affinity() to
target an interrupt to a VCPU) added just what we needed at the
lowest level to allow an interrupt to be deactivated by a guest.

When such a request reaches the GIC, it knows it doesn't need to
perform the deactivation anymore, and can safely leave the guest
do its magic. This of course requires additional support in both
VFIO and KVM.

Reviewed-by: Eric Auger eric.au...@linaro.org
Tested-by: Eric Auger eric.au...@linaro.org
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 drivers/irqchip/irq-gic.c | 60 +++
 1 file changed, 60 insertions(+)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 505aaf3..5e48850 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -148,6 +148,36 @@ static inline bool primary_gic_irq(struct irq_data *d)
return true;
 }
 
+static inline bool cascading_gic_irq(struct irq_data *d)
+{
+   void *data = irq_data_get_irq_handler_data(d);
+
+   /*
+* If handler_data pointing to one of the secondary GICs, then
+* this is a cascading interrupt, and it cannot possibly be
+* forwarded.
+*/
+   if (data = (void *)(gic_data + 1) 
+   data   (void *)(gic_data + MAX_GIC_NR))
+   return true;
+
+   return false;
+}
+
+static inline bool forwarded_irq(struct irq_data *d)
+{
+   /*
+* A forwarded interrupt:
+* - is on the primary GIC
+* - has its handler_data set to a value
+* - that isn't a secondary GIC
+*/
+   if (primary_gic_irq(d)  d-handler_data  !cascading_gic_irq(d))
+   return true;
+
+   return false;
+}
+
 /*
  * Routines to acknowledge, disable and enable interrupts
  */
@@ -166,6 +196,18 @@ static int gic_peek_irq(struct irq_data *d, u32 offset)
 static void gic_mask_irq(struct irq_data *d)
 {
gic_poke_irq(d, GIC_DIST_ENABLE_CLEAR);
+   /*
+* When masking a forwarded interrupt, make sure it is
+* deactivated as well.
+*
+* This ensures that an interrupt that is getting
+* disabled/masked will not get stuck, because there is
+* noone to deactivate it (guest is being terminated).
+*/
+   if (static_key_true(supports_deactivate)) {
+   if (forwarded_irq(d))
+   gic_poke_irq(d, GIC_DIST_ACTIVE_CLEAR);
+   }
 }
 
 static void gic_unmask_irq(struct irq_data *d)
@@ -178,6 +220,10 @@ static void gic_eoi_irq(struct irq_data *d)
u32 deact_offset = GIC_CPU_EOI;
 
if (static_key_true(supports_deactivate)) {
+   /* Do not deactivate an IRQ forwarded to a vcpu. */
+   if (forwarded_irq(d))
+   return;
+
if (primary_gic_irq(d))
deact_offset = GIC_CPU_DEACTIVATE;
}
@@ -251,6 +297,19 @@ static int gic_set_type(struct irq_data *d, unsigned int 
type)
return gic_configure_irq(gicirq, type, base, NULL);
 }
 
+static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
+{
+   /* Only interrupts on the primary GIC can be forwarded to a vcpu. */
+   if (static_key_true(supports_deactivate)) {
+   if (primary_gic_irq(d)  !cascading_gic_irq(d)) {
+   d-handler_data = vcpu;
+   return 0;
+   }
+   }
+
+   return -EINVAL;
+}
+
 #ifdef CONFIG_SMP
 static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
bool force)
@@ -346,6 +405,7 @@ static struct irq_chip gic_chip = {
 #endif
.irq_get_irqchip_state  = gic_irq_get_irqchip_state,
.irq_set_irqchip_state  = gic_irq_set_irqchip_state,
+   .irq_set_vcpu_affinity  = gic_irq_set_vcpu_affinity,
.flags  = IRQCHIP_SET_TYPE_MASKED,
 };
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2015 at 03:47:14PM +0800, Jason Wang wrote:
 We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
 and another is KVM_FAST_MMIO_BUS. This leads to issue:
 
 - kvm_io_bus_destroy() knows nothing about the devices on two buses
   points to a single dev. Which will lead double free [1] during exit.
 - wildcard eventfd ignores data len, so it was registered as a
   kvm_io_range with zero length. This will fail the binary search in
   kvm_io_bus_get_first_dev() when we try to emulate through
   KVM_MMIO_BUS. This will cause userspace io emulation request instead
   of a eventfd notification (virtqueue kick will be trapped by qemu
   instead of vhost in this case).
 
 Fixing this by don't register wildcard mmio eventfd on two
 buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
 double free issue of kvm_io_bus_destroy(). For the arch/setups that
 does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
 KVM_FAST_MMIO_BUS first to see it it has a match.
 
 [1] Panic caused by double free:
 
 CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
 #28-Ubuntu
 Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
 RIP: 0010:[c07e25d8]  [c07e25d8] 
 ioeventfd_release+0x28/0x60 [kvm]
 RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
 RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
 R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
 FS:  7fc1ee3e6700() GS:88023e24() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
 Stack:
 88021e7cc000  88020e7f3be8 c07e2622
 88020e7f3c38 c07df69a 880232524160 88020e792d80
   880219b78c00 0008 8802321686a8
 Call Trace:
 [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
 [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
 [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
 [811f69f7] __fput+0xe7/0x250
 [811f6bae] fput+0xe/0x10
 [81093f04] task_work_run+0xd4/0xf0
 [81079358] do_exit+0x368/0xa50
 [81082c8f] ? recalc_sigpending+0x1f/0x60
 [81079ad5] do_group_exit+0x45/0xb0
 [81085c71] get_signal+0x291/0x750
 [810144d8] do_signal+0x28/0xab0
 [810f3a3b] ? do_futex+0xdb/0x5d0
 [810b7028] ? __wake_up_locked_key+0x18/0x20
 [810f3fa6] ? SyS_futex+0x76/0x170
 [81014fc9] do_notify_resume+0x69/0xb0
 [817cb9af] int_signal+0x12/0x17
 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 
 01 10 00 00
 RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
 RSP 88020e7f3bc8
 
 Cc: Gleb Natapov g...@kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com

I'm worried that this slows down the regular MMIO.
Could you share performance #s please?
You need a mix of len=0 and len=2 matches.

One solution for the first issue is to create two ioeventfd objects instead.
For the second issue, we could change bsearch compare function instead.
Again, affects all devices to performance #s would be needed.

 ---
 Changes from v1:
 - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
   needed to save lots of unnecessary changes.
 ---
  virt/kvm/eventfd.c  | 30 --
  virt/kvm/kvm_main.c | 16 ++--
  2 files changed, 22 insertions(+), 24 deletions(-)
 
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 9ff4193..95f2901 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
 _ioeventfd *p)
   return false;
  }
  
 -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
 +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args)
  {
 - if (flags  KVM_IOEVENTFD_FLAG_PIO)
 + if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
   return KVM_PIO_BUS;
 - if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 + if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
   return KVM_VIRTIO_CCW_NOTIFY_BUS;
 - return KVM_MMIO_BUS;
 + if (args-len)
 + return KVM_MMIO_BUS;
 + return KVM_FAST_MMIO_BUS;
  }
  
  static int
 @@ -779,7 +781,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct 
 kvm_ioeventfd *args)
   struct eventfd_ctx   *eventfd;
   int   ret;
  
 - bus_idx = 

[PATCH v3 2/4] irqchip: GICv3: Don't deactivate interrupts forwarded to a guest

2015-08-25 Thread Marc Zyngier
Commit 0a4377de3056 (genirq: Introduce irq_set_vcpu_affinity() to
target an interrupt to a VCPU) added just what we needed at the
lowest level to allow an interrupt to be deactivated by a guest.

When such a request reaches the GIC, it knows it doesn't need to
perform the deactivation anymore, and can safely leave the guest
do its magic. This of course requires additional support in both
VFIO and KVM.

Reviewed-by: Eric Auger eric.au...@linaro.org
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 drivers/irqchip/irq-gic-v3.c | 35 +--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index addd2ee..5aa9bf6 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -70,6 +70,11 @@ static inline int gic_irq_in_rdist(struct irq_data *d)
return gic_irq(d)  32;
 }
 
+static inline bool forwarded_irq(struct irq_data *d)
+{
+   return d-handler_data != NULL;
+}
+
 static inline void __iomem *gic_dist_base(struct irq_data *d)
 {
if (gic_irq_in_rdist(d))/* SGI+PPI - SGI_base for this CPU */
@@ -231,6 +236,18 @@ static void gic_poke_irq(struct irq_data *d, u32 offset)
 static void gic_mask_irq(struct irq_data *d)
 {
gic_poke_irq(d, GICD_ICENABLER);
+   /*
+* When masking a forwarded interrupt, make sure it is
+* deactivated as well.
+*
+* This ensures that an interrupt that is getting
+* disabled/masked will not get stuck, because there is
+* noone to deactivate it (guest is being terminated).
+*/
+   if (static_key_true(supports_deactivate)) {
+   if (forwarded_irq(d))
+   gic_poke_irq(d, GICD_ICACTIVER);
+   }
 }
 
 static void gic_unmask_irq(struct irq_data *d)
@@ -296,8 +313,11 @@ static int gic_irq_get_irqchip_state(struct irq_data *d,
 static void gic_eoi_irq(struct irq_data *d)
 {
if (static_key_true(supports_deactivate)) {
-   /* No need to deactivate an LPI */
-   if (gic_irq(d) = 8192)
+   /*
+* No need to deactivate an LPI, or an interrupt that
+* is is getting forwarded to a vcpu.
+*/
+   if (gic_irq(d) = 8192 || forwarded_irq(d))
return;
gic_write_dir(gic_irq(d));
} else {
@@ -331,6 +351,16 @@ static int gic_set_type(struct irq_data *d, unsigned int 
type)
return gic_configure_irq(irq, type, base, rwp_wait);
 }
 
+static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
+{
+   if (static_key_true(supports_deactivate)) {
+   d-handler_data = vcpu;
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
 static u64 gic_mpidr_to_affinity(u64 mpidr)
 {
u64 aff;
@@ -683,6 +713,7 @@ static struct irq_chip gic_chip = {
.irq_set_affinity   = gic_set_affinity,
.irq_get_irqchip_state  = gic_irq_get_irqchip_state,
.irq_set_irqchip_state  = gic_irq_set_irqchip_state,
+   .irq_set_vcpu_affinity  = gic_irq_set_vcpu_affinity,
.flags  = IRQCHIP_SET_TYPE_MASKED,
 };
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts

2015-08-25 Thread Wu, Feng


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, August 26, 2015 3:58 AM
 To: Wu, Feng
 Cc: pbonz...@redhat.com; j...@8bytes.org; mtosa...@redhat.com;
 eric.au...@linaro.org; kvm@vger.kernel.org;
 io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org
 Subject: Re: [PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts
 
 On Tue, 2015-08-25 at 16:50 +0800, Feng Wu wrote:
  This patch adds the routine to update IRTE for posted-interrupts
  when guest changes the interrupt configuration.
 
  Signed-off-by: Feng Wu feng...@intel.com
  ---
   arch/x86/kvm/x86.c | 73
 ++
   1 file changed, 73 insertions(+)
  +   kvm_set_msi_irq(e, irq);
  +   if (!kvm_intr_is_single_vcpu(kvm, irq, vcpu))
  +   continue;
  +
  +   vcpu_info.pi_desc_addr = kvm_x86_ops-get_pi_desc_addr(vcpu);
  +   vcpu_info.vector = irq.vector;
  +
  +   if (set)
  +   ret = irq_set_vcpu_affinity(host_irq, vcpu_info);
  +   else {
  +   /* suppress notification event before unposting */
  +   kvm_x86_ops-pi_set_sn(vcpu);
  +   ret = irq_set_vcpu_affinity(host_irq, NULL);
  +   kvm_x86_ops-pi_clear_sn(vcpu);
  +   }
 
 Can we add trace events so that we have a way to tell when PI is being
 enabled/disabled other than performance heuristics?  Thanks,

Sure, I will add it.

Thanks,
Feng

 
 Alex
  
 



答复: I'm now looking into kvm-unit-tests and encounted with some problems.

2015-08-25 Thread Huangpeng (Peter)
You should add kvm maillinglist too.

-邮件原件-
发件人: Jinjian (Ken) 
发送时间: 2015年8月25日 21:42
收件人: drjo...@redhat.com; pbonz...@redhat.com
抄送: Huangpeng (Peter); Gonglei (Arei); Zhanghailiang
主题: I'm now looking into kvm-unit-tests and encounted with some problems.

Hi all:
I'm now looking into kvm-unit-tests and encounted with some problems.

1. when I run run_test.sh, it reported exec: {config_fd}: not found.
how and where to define it?

2. all tests run with -smp 2(or 3) hang.
for example, run apic unittest, command as follows:
qemu-kvm --enable-kvm -device pc-testdev -device
isa-debug-exit,iobase=0xf4,iosize=0x4 -serial stdio -device pci-testdev -kernel 
x86/apic.flat -smp 2 -vnc none
result: enter into an endless loop
the related codes are:
x86/apic.c
   test_sti_nmi()
 on_cpu_async(1, sti_loop, 0);

   static void sti_loop(void *ignore)
   {
  unsigned k = 0;

  while (sti_loop_active) {
sti_nop((char *)(ulong)((k++ * 4096) % (128 * 1024 * 1024)));
  }
   }

3. s3 kvm-unit-test hang
run s3 unittest, command as follows:
qemu-kvm --enable-kvm -device pc-testdev -device
isa-debug-exit,iobase=0xf4,iosize=0x4 -serial stdio -device pci-testdev -kernel 
x86/s3.flat -vnc none
s3 hang at resume event
logs:
RSDP is at f62c0
RSDT is at 7fe16a9
FADT is at 7fe0bda
FACS is at 7fe
resume vector addr is 7fe000c
copy resume code from 400350

4.qemu exit and fail even after the problematic code is commented, when we run 
emulate unittest.
run emulate unittest,command as follows:
qemu-kvm --enable-kvm -device pc-testdev -device
isa-debug-exit,iobase=0xf4,iosize=0x4  -serial stdio -device pci-testdev 
-kernel x86/emulator.flat -vnc none
result: qemu exit when do test_muldiv
logs:
PASS: imul rax, mem, imm
unhandled cpu excecption 8

If The code which cause qemu exit is commented, the test also fail
logs:
FAIL: mov null, %ss

Question:
What's the cause of the problem at your view? looking forward for your reply.

Thank you in advance.





[PATCH v2 0/3] KVM: Dynamic halt_poll_ns

2015-08-25 Thread Wanpeng Li
v1 - v2:
 * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of 
   the module parameter
 * use the shrink/grow matrix which is suggested by David
 * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle 
VCPU which can waste cpu usage. This patchset add the ability to adjust 
halt_poll_ns dynamically. 

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter, 
halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally 
rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow 
matrix is suggested by David: 

if (poll successfully for interrupt): stay the same
  else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink
  else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow

  halt_poll_ns_shrink/ |
  halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns 
  -+--+---
   1  |  = halt_poll_ns  |  = 0 
   halt_poll_ns   | *= halt_poll_ns_grow | /= halt_poll_ns_shrink
  otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink

Wanpeng Li (3):
  KVM: make halt_poll_ns per-VCPU
  KVM: dynamic halt_poll_ns adjustment
  KVM: trace kvm_halt_poll_ns grow/shrink

 include/linux/kvm_host.h   |  1 +
 include/trace/events/kvm.h | 30 ++
 virt/kvm/kvm_main.c| 78 --
 3 files changed, 106 insertions(+), 3 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] KVM: trace kvm_halt_poll_ns grow/shrink

2015-08-25 Thread Wanpeng Li
Tracepoint for dynamic halt_pool_ns, fired on every potential change.

Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
---
 include/trace/events/kvm.h | 30 ++
 virt/kvm/kvm_main.c|  8 
 2 files changed, 38 insertions(+)

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index a44062d..75ddf80 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -356,6 +356,36 @@ TRACE_EVENT(
  __entry-address)
 );
 
+TRACE_EVENT(kvm_halt_poll_ns,
+   TP_PROTO(bool grow, unsigned int vcpu_id, int new, int old),
+   TP_ARGS(grow, vcpu_id, new, old),
+
+   TP_STRUCT__entry(
+   __field(bool, grow)
+   __field(unsigned int, vcpu_id)
+   __field(int, new)
+   __field(int, old)
+   ),
+
+   TP_fast_assign(
+   __entry-grow   = grow;
+   __entry-vcpu_id= vcpu_id;
+   __entry-new= new;
+   __entry-old= old;
+   ),
+
+   TP_printk(vcpu %u: halt_pool_ns %d (%s %d),
+   __entry-vcpu_id,
+   __entry-new,
+   __entry-grow ? grow : shrink,
+   __entry-old)
+);
+
+#define trace_kvm_halt_poll_ns_grow(vcpu_id, new, old) \
+   trace_kvm_halt_poll_ns(true, vcpu_id, new, old)
+#define trace_kvm_halt_poll_ns_shrink(vcpu_id, new, old) \
+   trace_kvm_halt_poll_ns(false, vcpu_id, new, old)
+
 #endif
 
 #endif /* _TRACE_KVM_MAIN_H */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2a4962b..04f62e0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1957,13 +1957,21 @@ static unsigned int __shrink_halt_poll_ns(int val, int 
modifier, int minimum)
 
 static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
 {
+   int old = vcpu-halt_poll_ns;
+
vcpu-halt_poll_ns = __grow_halt_poll_ns(vcpu-halt_poll_ns);
+
+   trace_kvm_halt_poll_ns_grow(vcpu-vcpu_id, vcpu-halt_poll_ns, old);
 }
 
 static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu)
 {
+   int old = vcpu-halt_poll_ns;
+
vcpu-halt_poll_ns = __shrink_halt_poll_ns(vcpu-halt_poll_ns,
halt_poll_ns_shrink, halt_poll_ns);
+
+   trace_kvm_halt_poll_ns_shrink(vcpu-vcpu_id, vcpu-halt_poll_ns, old);
 }
 
 static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment

2015-08-25 Thread Wanpeng Li
There is a downside of halt_poll_ns since poll is still happen for idle 
VCPU which can waste cpu usage. This patch adds the ability to adjust 
halt_poll_ns dynamically. 

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter, 
halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally 
rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow 
matrix is suggested by David: 

if (poll successfully for interrupt): stay the same
  else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink
  else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow

  halt_poll_ns_shrink/ |
  halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns 
  -+--+---
   1  |  = halt_poll_ns  |  = 0 
   halt_poll_ns   | *= halt_poll_ns_grow | /= halt_poll_ns_shrink
  otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink

Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
---
 virt/kvm/kvm_main.c | 65 -
 1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 93db833..2a4962b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,9 +66,26 @@
 MODULE_AUTHOR(Qumranet);
 MODULE_LICENSE(GPL);
 
-static unsigned int halt_poll_ns;
+#define KVM_HALT_POLL_NS  50
+#define KVM_HALT_POLL_NS_GROW   2
+#define KVM_HALT_POLL_NS_SHRINK 0
+#define KVM_HALT_POLL_NS_MAX 200
+
+static unsigned int halt_poll_ns = KVM_HALT_POLL_NS;
 module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);
 
+/* Default doubles per-vcpu halt_poll_ns. */
+static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW;
+module_param(halt_poll_ns_grow, int, S_IRUGO);
+
+/* Default resets per-vcpu halt_poll_ns . */
+static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK;
+module_param(halt_poll_ns_shrink, int, S_IRUGO);
+
+/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */
+static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX;
+module_param(halt_poll_ns_max, int, S_IRUGO);
+
 /*
  * Ordering of locks:
  *
@@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, 
gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);
 
+static unsigned int __grow_halt_poll_ns(unsigned int val)
+{
+   if (halt_poll_ns_grow  1)
+   return halt_poll_ns;
+
+   val = min(val, halt_poll_ns_max);
+
+   if (val == 0)
+   return halt_poll_ns;
+
+   if (halt_poll_ns_grow  halt_poll_ns)
+   val *= halt_poll_ns_grow;
+   else
+   val += halt_poll_ns_grow;
+
+   return val;
+}
+
+static unsigned int __shrink_halt_poll_ns(int val, int modifier, int minimum)
+{
+   if (modifier  1)
+   return 0;
+
+   if (modifier  halt_poll_ns)
+   val /= modifier;
+   else
+   val -= modifier;
+
+   return val;
+}
+
+static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
+{
+   vcpu-halt_poll_ns = __grow_halt_poll_ns(vcpu-halt_poll_ns);
+}
+
+static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu)
+{
+   vcpu-halt_poll_ns = __shrink_halt_poll_ns(vcpu-halt_poll_ns,
+   halt_poll_ns_shrink, halt_poll_ns);
+}
+
 static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
 {
if (kvm_arch_vcpu_runnable(vcpu)) {
@@ -1954,6 +2013,10 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
break;
 
waited = true;
+   if (vcpu-halt_poll_ns  halt_poll_ns_max)
+   shrink_halt_poll_ns(vcpu);
+   else
+   grow_halt_poll_ns(vcpu);
schedule();
}
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM slow LAMP guest

2015-08-25 Thread Wanpeng Li

On 8/26/15 6:41 AM, Hansa wrote:

On 26-8-2015 0:33, Wanpeng Li wrote:

On the VM server I issued the command below every eleven minutes:

date   curltest-file; _
top -b -n 1 | sed -n '7,12p'  curltest-file; _
curl -o /dev/null -s -wtime_total: %{time_total}\\n
https://my.domain.com | perl -pe 'BEGIN {use POSIX;} print
strftime(%Y-%m-%d %H:%M:%S , localtime)'  curltest-file

This gives me the total time for displaying my site on a local
machine. It also includes a 'top' command to display which processes
are running at each sample. All is saved in a file called
curltest-file.

I found 7 occurrences in my curltest-file of a time_total larger
than 20 seconds. Top however didn't show any significant CPU or IO
activity at those sampled times. Further investigations shows me
that they are related to a known (gravatar)  issue in the Wordpress
Jetpack plugin. I didn't include these samples in the average total.


If you just use halt_poll_ns or both halt_poll_ns and idle=poll in
guest?


I just use kvm.halt_poll_ns=50
Should I try some different tests?


Looks good to me currently. Per vCPU will consume almost half pCPU's 
capacity in host when add idle=poll in my testing which is not suitable 
for some cloud computing scenarios since vCPUs have high overcommit 
ratio on host.


Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: [ERROR] INIT: PANIC: segmentation violation! sleeping for 30 seconds.

2015-08-25 Thread Jon Panozzo
KVM Team,

We have a user that is experiencing an odd issue when trying to create
VMs on his system which causes the entire host to crash, printing the
following message to the console:

[ERROR] INIT: PANIC: segmentation violation! sleeping for 30 seconds.

We have never seen this issue before, but the user's hardware is quite
old (circa 2008).  The odd thing is that he is having no issues
utilizing alternative hypervisors such as Microsoft Hyper-V or
XenServer on the same hardware.

System Information:

Linux Kernel 4.0.4
QEMU 2.3.0
CPU Intel(R) Core(TM)2 CPU E8400 @ 3.00GHz (fam: 06, model: 17, stepping: 0a)

What other information can I provide to help track down the root cause
of this issue?  Thank you for your time.

Sincerest Regards,

Jonathan Panozzo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm/powerpc: fix a build error in e500_tlb.c

2015-08-25 Thread mani
Kevin Hao haokexin at gmail.com writes:

 
 We use the wrong number arguments when invoking trace_kvm_stlb_inval,
 and cause the following build error.
 arch/powerpc/kvm/e500_tlb.c: In function 'kvmppc_e500_stlbe_invalidate':
 arch/powerpc/kvm/e500_tlb.c:230: error: too many arguments to 
function 'trace_kvm_stlb_inval'
 
 Signed-off-by: Kevin Hao haokexin at gmail.com
 ---
  arch/powerpc/kvm/e500_tlb.c |3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)
 
 diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
 index 21011e1..1261a21 100644
 --- a/arch/powerpc/kvm/e500_tlb.c
 +++ b/arch/powerpc/kvm/e500_tlb.c
  at  at  -226,8 +226,7  at  at  static void 
kvmppc_e500_stlbe_invalidate(struct kvmppc_vcpu_e500 *vcpu_e500,
 
   kvmppc_e500_shadow_release(vcpu_e500, tlbsel, esel);
   stlbe-mas1 = 0;
 - trace_kvm_stlb_inval(index_of(tlbsel, esel), stlbe-mas1, 
stlbe-mas2,
 -  stlbe-mas3, stlbe-mas7);
 + trace_kvm_stlb_inval(index_of(tlbsel, esel));
  }
 
  static void kvmppc_e500_tlb1_invalidate(struct kvmppc_vcpu_e500 
*vcpu_e500,




it worked was able to build image after this change, I have one query how 
to check on PowerPC that KVM is enabled, or VT is enabled, any cli 
available to check this data. please share it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I'm now looking into kvm-unit-tests and encounted with some problems.

2015-08-25 Thread Jinjian (Ken)

To Peter,
   Thank you very much!
   I'm sorry this is my first kvm mail, and have no experience.

On 2015/8/26 8:40, Huangpeng (Peter) wrote:

You should add kvm maillinglist too.

-邮件原件-
发件人: Jinjian (Ken)
发送时间: 2015年8月25日 21:42
收件人: drjo...@redhat.com; pbonz...@redhat.com
抄送: Huangpeng (Peter); Gonglei (Arei); Zhanghailiang
主题: I'm now looking into kvm-unit-tests and encounted with some problems.

Hi all:
 I'm now looking into kvm-unit-tests and encounted with some problems.

1. when I run run_test.sh, it reported exec: {config_fd}: not found.
how and where to define it?

2. all tests run with -smp 2(or 3) hang.
for example, run apic unittest, command as follows:
qemu-kvm --enable-kvm -device pc-testdev -device
isa-debug-exit,iobase=0xf4,iosize=0x4 -serial stdio -device pci-testdev -kernel 
x86/apic.flat -smp 2 -vnc none
result: enter into an endless loop
the related codes are:
x86/apic.c
test_sti_nmi()
  on_cpu_async(1, sti_loop, 0);

static void sti_loop(void *ignore)
{
   unsigned k = 0;

   while (sti_loop_active) {
 sti_nop((char *)(ulong)((k++ * 4096) % (128 * 1024 * 1024)));
   }
}

3. s3 kvm-unit-test hang
run s3 unittest, command as follows:
qemu-kvm --enable-kvm -device pc-testdev -device
isa-debug-exit,iobase=0xf4,iosize=0x4 -serial stdio -device pci-testdev -kernel 
x86/s3.flat -vnc none
s3 hang at resume event
logs:
RSDP is at f62c0
RSDT is at 7fe16a9
FADT is at 7fe0bda
FACS is at 7fe
resume vector addr is 7fe000c
copy resume code from 400350

4.qemu exit and fail even after the problematic code is commented, when we run 
emulate unittest.
run emulate unittest,command as follows:
qemu-kvm --enable-kvm -device pc-testdev -device
isa-debug-exit,iobase=0xf4,iosize=0x4  -serial stdio -device pci-testdev 
-kernel x86/emulator.flat -vnc none
result: qemu exit when do test_muldiv
logs:
PASS: imul rax, mem, imm
unhandled cpu excecption 8

If The code which cause qemu exit is commented, the test also fail
logs:
FAIL: mov null, %ss

Question:
What's the cause of the problem at your view? looking forward for your reply.

Thank you in advance.





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html