date:20150825

Re: [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function

2015-08-25 Thread Stefan Hajnoczi

On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote:
 @@ -306,6 +354,18 @@ struct dsm_buffer {
  static ram_addr_t dsm_addr;
  static size_t dsm_size;
  
 +struct cmd_out_implemented {

QEMU coding style uses typedef struct {} CamelCase.  Please follow this
convention in all user-defined structs (see ./CODING_STYLE).

  static void dsm_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
  {
 +struct MemoryRegion *dsm_ram_mr = opaque;
 +struct dsm_buffer *dsm;
 +struct dsm_out *out;
 +void *buf;
 +
  assert(val == NOTIFY_VALUE);

The guest should not be able to cause an abort(3).  If val !=
NOTIFY_VALUE we can do nvdebug() and then return.

 +
 +buf = memory_region_get_ram_ptr(dsm_ram_mr);
 +dsm = buf;
 +out = buf;
 +
 +le32_to_cpus(dsm-handle);
 +le32_to_cpus(dsm-arg1);
 +le32_to_cpus(dsm-arg2);

Can SMP guests modify DSM RAM while this thread is running?

We must avoid race conditions.  It's probably better to copy in data
before byte-swapping or checking input values.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM slow LAMP guest

2015-08-25 Thread Hansa

On 24-8-2015 1:26, Wanpeng Li wrote:

On 8/24/15 3:18 AM, Hansa wrote:

On 16-7-2015 13:27, Paolo Bonzini wrote:

On 15/07/2015 22:02, C. Bröcker wrote:

What OS is this? Is it RHEL/CentOS? If so, halt_poll_ns will be in 6.7
which will be out in a few days/weeks.

Paolo

OK. As said CentOS 6.6.
But where do I put this parameter?

You can add kvm.halt_poll_ns=50 to the kernel command line. If
you have the parameter, you have the
/sys/module/kvm/parameters/halt_poll_ns file.

Hi,

I upgraded to the CentOS 6.7 release which came out last month and as promised
the halt_poll_ns parameter was available.
Last week I tested the availability status every 5 minutes on my Wordpress VM's
with the halt_poll_ns kernel param set on DOM0. I'm pleased to announce that it
solves the problem!

How much seconds to load your Wordpress site this time?

Regards,
Wanpeng Li

The average is around 0.4 seconds to load my heaviest site on my slowest
machine.

On the VM server I issued the command below every eleven minutes:

date curltest-file; _
top -b -n 1 | sed -n '7,12p' curltest-file; _
curl -o /dev/null -s -wtime_total: %{time_total}\\n https://my.domain.com | perl -pe 'BEGIN
{use POSIX;} print strftime(%Y-%m-%d %H:%M:%S , localtime)' curltest-file

This gives me the total time for displaying my site on a local machine. It also
includes a 'top' command to display which processes are running at each sample.
All is saved in a file called curltest-file.

I found 7 occurrences in my curltest-file of a time_total larger than 20
seconds. Top however didn't show any significant CPU or IO activity at those
sampled times. Further investigations shows me that they are related to a known
(gravatar) issue in the Wordpress Jetpack plugin. I didn't include these
samples in the average total.

Cheers and good luck tweaking your sites!
Best, Hansa
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM

2015-08-25 Thread Stefan Hajnoczi

On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
 Changlog:
 - Use litten endian for DSM method, thanks for Stefan's suggestion
 
 - introduce a new parameter, @configdata, if it's false, Qemu will
   build a static and readonly namespace in memory and use it serveing
   for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
   reserved region is needed at the end of the @file, it is good for
   the user who want to pass whole nvdimm device and make its data
   completely be visible to guest
 
 - divide the source code into separated files and add maintain info

I have skipped ACPI patches because I'm not very familiar with that
area.

Have you thought about live migration?

Are the contents of the NVDIMM migrated since they are registered as a
RAM region?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 3/4] irqchip: GIC: Convert to EOImode == 1

2015-08-25 Thread Marc Zyngier

Hi Thomas,

On 25/08/15 16:46, Thomas Gleixner wrote:
 On Tue, 25 Aug 2015, Marc Zyngier wrote:
 +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
 +
  #ifndef MAX_GIC_NR
  #define MAX_GIC_NR  1
  #endif
 @@ -137,6 +140,14 @@ static inline unsigned int gic_irq(struct irq_data *d)
  return d-hwirq;
  }
  
 +static inline bool primary_gic_irq(struct irq_data *d)
 +{
 +if (MAX_GIC_NR  1)
 +return irq_data_get_irq_chip_data(d) == gic_data[0];
 +
 +return true;
 +}
 +
  /*
   * Routines to acknowledge, disable and enable interrupts
   */
 @@ -164,7 +175,14 @@ static void gic_unmask_irq(struct irq_data *d)
  
  static void gic_eoi_irq(struct irq_data *d)
  {
 -writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI);
 +u32 deact_offset = GIC_CPU_EOI;
 +
 +if (static_key_true(supports_deactivate)) {
 +if (primary_gic_irq(d))
 +deact_offset = GIC_CPU_DEACTIVATE;
 
 I really wonder for the whole series whether you really want all that
 static key dance and extra conditionals in the callbacks instead of
 just using seperate irq chips for the different interrupts.

Hmmm. We definitely could have different irqchips between primary and
secondary controllers indeed. We'd still need a static key for the
gic_handle_irq path though, but that's not too bad.

Let me hack something, and I'll come back to you ;-).

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment

2015-08-25 Thread David Matlack

Thanks for writing v2, Wanpeng.

On Mon, Aug 24, 2015 at 11:35 PM, Wanpeng Li wanpeng...@hotmail.com wrote:
 There is a downside of halt_poll_ns since poll is still happen for idle
 VCPU which can waste cpu usage. This patch adds the ability to adjust
 halt_poll_ns dynamically.

What testing have you done with these patches? Do you know if this removes
the overhead of polling in idle VCPUs? Do we lose any of the performance
from always polling?


 There are two new kernel parameters for changing the halt_poll_ns:
 halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter,
 halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally
 rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow
 matrix is suggested by David:

 if (poll successfully for interrupt): stay the same
   else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink
   else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow

The way you implemented this wasn't what I expected. I thought you would time
the whole function (kvm_vcpu_block). But I like your approach better. It's
simpler and [by inspection] does what we want.


   halt_poll_ns_shrink/ |
   halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns
   -+--+---
1  |  = halt_poll_ns  |  = 0
halt_poll_ns   | *= halt_poll_ns_grow | /= halt_poll_ns_shrink
   otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink

I was curious why you went with this approach rather than just the
middle row, or just the last row. Do you think we'll want the extra
flexibility?


 Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
 ---
  virt/kvm/kvm_main.c | 65 
 -
  1 file changed, 64 insertions(+), 1 deletion(-)

 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 93db833..2a4962b 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -66,9 +66,26 @@
  MODULE_AUTHOR(Qumranet);
  MODULE_LICENSE(GPL);

 -static unsigned int halt_poll_ns;
 +#define KVM_HALT_POLL_NS  50
 +#define KVM_HALT_POLL_NS_GROW   2
 +#define KVM_HALT_POLL_NS_SHRINK 0
 +#define KVM_HALT_POLL_NS_MAX 200

The macros are not necessary. Also, hard coding the numbers in the param
definitions will make reading the comments above them easier.

 +
 +static unsigned int halt_poll_ns = KVM_HALT_POLL_NS;
  module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);

 +/* Default doubles per-vcpu halt_poll_ns. */
 +static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW;
 +module_param(halt_poll_ns_grow, int, S_IRUGO);
 +
 +/* Default resets per-vcpu halt_poll_ns . */
 +static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK;
 +module_param(halt_poll_ns_shrink, int, S_IRUGO);
 +
 +/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */

Ah, I misspoke before. I was thinking about round-trip latency. The latency
of a single halt is reduced by about 5-7 us.

 +static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX;
 +module_param(halt_poll_ns_max, int, S_IRUGO);

We can remove halt_poll_ns_max. vcpu-halt_poll_ns can always start at zero
and grow from there. Then we just need one module param to keep
vcpu-halt_poll_ns from growing too large.

[ It would make more sense to remove halt_poll_ns and keep halt_poll_ns_max,
  but since halt_poll_ns already exists in upstream kernels, we probably can't
  remove it. ]

 +
  /*
   * Ordering of locks:
   *
 @@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, 
 gfn_t gfn)
  }
  EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);

 +static unsigned int __grow_halt_poll_ns(unsigned int val)
 +{
 +   if (halt_poll_ns_grow  1)
 +   return halt_poll_ns;
 +
 +   val = min(val, halt_poll_ns_max);
 +
 +   if (val == 0)
 +   return halt_poll_ns;
 +
 +   if (halt_poll_ns_grow  halt_poll_ns)
 +   val *= halt_poll_ns_grow;
 +   else
 +   val += halt_poll_ns_grow;
 +
 +   return val;
 +}
 +
 +static unsigned int __shrink_halt_poll_ns(int val, int modifier, int minimum)

minimum never gets used.

 +{
 +   if (modifier  1)
 +   return 0;
 +
 +   if (modifier  halt_poll_ns)
 +   val /= modifier;
 +   else
 +   val -= modifier;
 +
 +   return val;
 +}
 +
 +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)

These wrappers aren't necessary.

 +{
 +   vcpu-halt_poll_ns = __grow_halt_poll_ns(vcpu-halt_poll_ns);
 +}
 +
 +static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu)
 +{
 +   vcpu-halt_poll_ns = __shrink_halt_poll_ns(vcpu-halt_poll_ns,
 +   halt_poll_ns_shrink, halt_poll_ns);
 +}
 +
  static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
  {
 if (kvm_arch_vcpu_runnable(vcpu)) {
 @@ -1954,6 +2013,10 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)

Re: [PATCH v2 10/18] nvdimm: init the address region used by DSM method

2015-08-25 Thread Stefan Hajnoczi

On Fri, Aug 14, 2015 at 10:52:03PM +0800, Xiao Guangrong wrote:
 @@ -257,14 +258,91 @@ static void build_nfit_table(GSList *device_list, char 
 *buf)
  }
  }
  
 +struct dsm_buffer {
 +/* RAM page. */
 +uint32_t handle;
 +uint8_t arg0[16];
 +uint32_t arg1;
 +uint32_t arg2;
 +union {
 +char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
 +};
 +
 +/* MMIO page. */
 +union {
 +uint32_t notify;
 +char pedding[PAGE_SIZE];

s/pedding/padding/

 +};
 +};
 +
 +static ram_addr_t dsm_addr;
 +static size_t dsm_size;
 +
 +static uint64_t dsm_read(void *opaque, hwaddr addr,
 + unsigned size)
 +{
 +return 0;
 +}
 +
 +static void dsm_write(void *opaque, hwaddr addr,
 +  uint64_t val, unsigned size)
 +{
 +}
 +
 +static const MemoryRegionOps dsm_ops = {
 +.read = dsm_read,
 +.write = dsm_write,
 +.endianness = DEVICE_LITTLE_ENDIAN,
 +};
 +
 +static int build_dsm_buffer(void)
 +{
 +MemoryRegion *dsm_ram_mr, *dsm_mmio_mr;
 +ram_addr_t addr;;

s/;;/;/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data

2015-08-25 Thread Stefan Hajnoczi

On Fri, Aug 14, 2015 at 10:52:06PM +0800, Xiao Guangrong wrote:
 +#ifdef NVDIMM_DEBUG
 +#define nvdebug(fmt, ...) fprintf(stderr, nvdimm:  fmt, ## __VA_ARGS__)
 +#else
 +#define nvdebug(...)
 +#endif

The following allows the compiler to check format strings and syntax
check the argument expressions:

#define NVDIMM_DEBUG 0  /* set to 1 for debug output */
#define nvdebug(fmt, ...) \
if (NVDIMM_DEBUG) { \
fprintf(stderr, nvdimm:  fmt, ## __VA_ARGS__); \
}

This approach avoids bitrot (e.g. debug format string arguments have
become outdated).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function

2015-08-25 Thread Stefan Hajnoczi

On Fri, Aug 14, 2015 at 10:52:08PM +0800, Xiao Guangrong wrote:
 Function 4 is used to get Namespace lable size

s/lable/label/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 12/15] KVM: arm64: sync LPI configuration and pending tables

2015-08-25 Thread Andre Przywara

Hi Eric,

On 14/08/15 12:58, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 The LPI configuration and pending tables of the GICv3 LPIs are held
 in tables in (guest) memory. To achieve reasonable performance, we
 cache this data in our own data structures, so we need to sync those
 two views from time to time. This behaviour is well described in the
 GICv3 spec and is also exercised by hardware, so the sync points are
 well known.

 Provide functions that read the guest memory and store the
 information from the configuration and pending tables in the kernel.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
 would help to have change log between v1 - v2 (valid for the whole series)
  include/kvm/arm_vgic.h  |   2 +
  virt/kvm/arm/its-emul.c | 124 
 
  virt/kvm/arm/its-emul.h |   3 ++
  3 files changed, 129 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 2a67a10..323c33a 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -167,6 +167,8 @@ struct vgic_its {
  int cwriter;
  struct list_headdevice_list;
  struct list_headcollection_list;
 +/* memory used for buffering guest's memory */
 +void*buffer_page;
  };
  
  struct vgic_dist {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index b9c40d7..05245cb 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -50,6 +50,7 @@ struct its_itte {
  struct its_collection *collection;
  u32 lpi;
  u32 event_id;
 +u8 priority;
  bool enabled;
  unsigned long *pending;
  };
 @@ -70,8 +71,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm 
 *kvm, int lpi)
  return NULL;
  }
  
 +#define LPI_PROP_ENABLE_BIT(p)  ((p)  LPI_PROP_ENABLED)
 +#define LPI_PROP_PRIORITY(p)((p)  0xfc)
 +
 +/* stores the priority and enable bit for a given LPI */
 +static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 
 prop)
 +{
 +itte-priority = LPI_PROP_PRIORITY(prop);
 +itte-enabled  = LPI_PROP_ENABLE_BIT(prop);
 +}
 +
 +#define GIC_LPI_OFFSET 8192
 +
 +/* We scan the table in chunks the size of the smallest page size */
 4kB chunks?

Marc was complaining about this wording, I think. The rationale was that
4K is already in the code and thus does not need to be repeated in the
comment, whereas the comment should explain the meaning of the value.

 +#define CHUNK_SIZE 4096U
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
  
 +static int nr_idbits_propbase(u64 propbaser)
 +{
 +int nr_idbits = (1U  (propbaser  0x1f)) + 1;
 +
 +return max(nr_idbits, INTERRUPT_ID_BITS_ITS);
 +}
 +
 +/*
 + * Scan the whole LPI configuration table and put the LPI configuration
 + * data in our own data structures. This relies on the LPI being
 + * mapped before.
 + */
 +static bool its_update_lpis_configuration(struct kvm *kvm)
 +{
 +struct vgic_dist *dist = kvm-arch.vgic;
 +u8 *prop = dist-its.buffer_page;
 +u32 tsize;
 +gpa_t propbase;
 +int lpi = GIC_LPI_OFFSET;
 +struct its_itte *itte;
 +struct its_device *device;
 +int ret;
 +
 +propbase = BASER_BASE_ADDRESS(dist-propbaser);
 +tsize = nr_idbits_propbase(dist-propbaser);
 +
 +while (tsize  0) {
 +int chunksize = min(tsize, CHUNK_SIZE);
 +
 +ret = kvm_read_guest(kvm, propbase, prop, chunksize);
 I think you still have the spin_lock issue  since if my understanding is
 correct this is called from
 vgic_handle_mmio_access/vcall_range_handler/gic_enable_lpis
 where vgic_handle_mmio_access. Or does it take another path?

Well, it's (also) called on handling the INVALL command, but you are
right that on that enable path the dist lock is held. I reckon that this
init part isn't racy so that shouldn't be a problem (famous last words ;-).
Let me see whether I can find a way to just drop the lock around the
while loop.

Cheers,
Andre.

 
 Shouldn't we create a new kvm_io_device to avoid holding the dist lock?
 
 Eric
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract

2015-08-25 Thread Stefan Hajnoczi

On Fri, Aug 14, 2015 at 10:51:59PM +0800, Xiao Guangrong wrote:
 +static void set_file(Object *obj, const char *str, Error **errp)
 +{
 +PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
 +
 +if (nvdimm-file) {
 +g_free(nvdimm-file);
 +}

g_free(NULL) is a nop so it's safe to replace the if with just
g_free(nvdimm-file).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 11/15] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-08-25 Thread Andre Przywara

Hi Eric,

On 14/08/15 12:58, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 As the actual LPI number in a guest can be quite high, but is mostly
 assigned using a very sparse allocation scheme, bitmaps and arrays
 for storing the virtual interrupt status are a waste of memory.
 We use our equivalent of the Interrupt Translation Table Entry
 (ITTE) to hold this extra status information for a virtual LPI.
 As the normal VGIC code cannot use it's fancy bitmaps to manage
 pending interrupts, we provide a hook in the VGIC code to let the
 ITS emulation handle the list register queueing itself.
 LPIs are located in a separate number range (=8192), so
 distinguishing them is easy. With LPIs being only edge-triggered, we
 get away with a less complex IRQ handling.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h  |  2 ++
  virt/kvm/arm/its-emul.c | 71 
 
  virt/kvm/arm/its-emul.h |  3 ++
  virt/kvm/arm/vgic-v3-emul.c |  2 ++
  virt/kvm/arm/vgic.c | 72 
 ++---
  5 files changed, 133 insertions(+), 17 deletions(-)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 1648668..2a67a10 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -147,6 +147,8 @@ struct vgic_vm_ops {
   int (*init_model)(struct kvm *);
   void(*destroy_model)(struct kvm *);
   int (*map_resources)(struct kvm *, const struct vgic_params *);
 + bool(*queue_lpis)(struct kvm_vcpu *);
 + void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
  };

  struct vgic_io_device {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index 7f217fa..b9c40d7 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -50,8 +50,26 @@ struct its_itte {
   struct its_collection *collection;
   u32 lpi;
   u32 event_id;
 + bool enabled;
 + unsigned long *pending;
  };

 +#define for_each_lpi(dev, itte, kvm) \
 + list_for_each_entry(dev, (kvm)-arch.vgic.its.device_list, dev_list) \
 + list_for_each_entry(itte, (dev)-itt, itte_list)
 +
 You have a checkpatch error here:
 
 ERROR: Macros with complex values should be enclosed in parentheses
 #52: FILE: virt/kvm/arm/its-emul.c:57:
 +#define for_each_lpi(dev, itte, kvm) \
 +   list_for_each_entry(dev, (kvm)-arch.vgic.its.device_list, dev_list) 
 \
 +   list_for_each_entry(itte, (dev)-itt, itte_list)

I know about that one. The problem is that if I add the parentheses it
breaks the usage below due to the curly brackets. But the definition
above is just so convenient and I couldn't find another neat solution so
far. If you are concerned about that I can give it another try,
otherwise I tend to just ignore checkpatch here.

 +static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi)
 +{
 can't we have the same LPI present in different interrupt translation
 tables? I don't know it is a sensible setting but I did not succeed in
 finding it was not possible.

Thanks to Marc I am happy (and relieved!) to point you to 6.1.1 LPI INTIDs:
The behavior of the GIC is UNPREDICTABLE if software:
- Maps multiple EventID/DeviceID combinations to the same physical LPI
INTID.

So I exercise the freedom of UNPREDICTABLE here ;-)

 + struct its_device *device;
 + struct its_itte *itte;
 +
 + for_each_lpi(device, itte, kvm) {
 + if (itte-lpi == lpi)
 + return itte;
 + }
 + return NULL;
 +}
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)

  /* The distributor lock is held by the VGIC MMIO handler. */
 @@ -145,6 +163,59 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu 
 *vcpu,
   return false;
  }

 +/*
 + * Find all enabled and pending LPIs and queue them into the list
 + * registers.
 + * The dist lock is held by the caller.
 + */
 +bool vits_queue_lpis(struct kvm_vcpu *vcpu)
 +{
 + struct vgic_its *its = vcpu-kvm-arch.vgic.its;
 + struct its_device *device;
 + struct its_itte *itte;
 + bool ret = true;
 +
 + if (!vgic_has_its(vcpu-kvm))
 + return true;
 + if (!its-enabled || !vcpu-kvm-arch.vgic.lpis_enabled)
 + return true;
 +
 + spin_lock(its-lock);
 + for_each_lpi(device, itte, vcpu-kvm) {
 + if (!itte-enabled || !test_bit(vcpu-vcpu_id, itte-pending))
 + continue;
 +
 + if (!itte-collection)
 + continue;
 +
 + if (itte-collection-target_addr != vcpu-vcpu_id)
 + continue;
 +
 + __clear_bit(vcpu-vcpu_id, itte-pending);
 +
 + ret = vgic_queue_irq(vcpu, 0, itte-lpi);
 what if the vgic_queue_irq fails since no LR can be found, the
 itte-pending was cleared so we forget that LPI? shouldn't we restore
 the pending state in ITT? in vgic_queue_hwirq the state change only is
 performed if the

Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Joe Perches

On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
 All fields of kvm_io_range were initialized or copied explicitly
 afterwards. So switch to use kmalloc().

Is there any compiler added alignment padding
in either structure?  If so, those padding
areas would now be uninitialized and may leak
kernel data if copied to user-space.

 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
[]
 @@ -3248,7 +3248,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum 
 kvm_bus bus_idx, gpa_t addr,
   if (bus-dev_count - bus-ioeventfd_count  NR_IOBUS_DEVS - 1)
   return -ENOSPC;
  
 - new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count + 1) *
 + new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count + 1) *
 sizeof(struct kvm_io_range)), GFP_KERNEL);
   if (!new_bus)
   return -ENOMEM;
 @@ -3280,7 +3280,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum 
 kvm_bus bus_idx,
   if (r)
   return r;
  
 - new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count - 1) *
 + new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count - 1) *
 sizeof(struct kvm_io_range)), GFP_KERNEL);
   if (!new_bus)
   return -ENOMEM;



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM

2015-08-25 Thread Stefan Hajnoczi

On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
 diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
 index a53d235..7a270a8 100644
 --- a/hw/mem/nvdimm/pc-nvdimm.c
 +++ b/hw/mem/nvdimm/pc-nvdimm.c
 @@ -24,6 +24,19 @@
  
  #include hw/mem/pc-nvdimm.h
  
 +#define PAGE_SIZE  (1UL  12)

This macro name is likely to collide with system headers or other code.

Could you use the existing TARGET_PAGE_SIZE constant instead?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-08-25 Thread Stefan Hajnoczi

On Fri, Aug 14, 2015 at 10:52:01PM +0800, Xiao Guangrong wrote:
 The parameter @file is used as backed memory for NVDIMM which is
 divided into two parts if @dataconfig is true:

s/dataconfig/configdata/

 @@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
   set_configdata, NULL);
  }
  
 +static uint64_t get_file_size(int fd)
 +{
 +struct stat stat_buf;
 +uint64_t size;
 +
 +if (fstat(fd, stat_buf)  0) {
 +return 0;
 +}
 +
 +if (S_ISREG(stat_buf.st_mode)) {
 +return stat_buf.st_size;
 +}
 +
 +if (S_ISBLK(stat_buf.st_mode)  !ioctl(fd, BLKGETSIZE64, size)) {
 +return size;
 +}

#ifdef __linux__ for ioctl(fd, BLKGETSIZE64, size)?

There is nothing Linux-specific about emulating NVDIMMs so this code
should compile on all platforms.

 +
 +return 0;
 +}
 +
  static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
  {
  PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
 +char name[512];
 +void *buf;
 +ram_addr_t addr;
 +uint64_t size, nvdimm_size, config_size = MIN_CONFIG_DATA_SIZE;
 +int fd;
  
  if (!nvdimm-file) {
  error_setg(errp, file property is not set);
  }

Missing return here.

 +
 +fd = open(nvdimm-file, O_RDWR);

Does it make sense to support read-only NVDIMMs?

It could be handy for sharing a read-only file between unprivileged
guests.  The permissions on the file would only allow read, not write.

 +if (fd  0) {
 +error_setg(errp, can not open %s, nvdimm-file);

s/can not/cannot/

 +return;
 +}
 +
 +size = get_file_size(fd);
 +buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE.
This can be added in the future.

 +if (buf == MAP_FAILED) {
 +error_setg(errp, can not do mmap on %s, nvdimm-file);
 +goto do_close;
 +}
 +
 +nvdimm-config_data_size = config_size;
 +if (nvdimm-configdata) {
 +/* reserve MIN_CONFIGDATA_AREA_SIZE for configue data. */
 +nvdimm_size = size - config_size;
 +nvdimm-config_data_addr = buf + nvdimm_size;
 +} else {
 +nvdimm_size = size;
 +nvdimm-config_data_addr = NULL;
 +}
 +
 +if ((int64_t)nvdimm_size = 0) {

The error cases can be detected before mmap(2).  That avoids the int64_t
cast and also avoids nvdimm_size underflow and the bogus
nvdimm-config_data_addr calculation above.

size = get_file_size(fd);
if (size == 0) {
error_setg(errp, empty file or unable to get file size);
goto do_close;
} else if (nvdimm-configdata  size  config_size) {{
error_setg(errp, file size is too small to store NVDIMM
  configure data);
goto do_close;
}

 +error_setg(errp, file size is too small to store NVDIMM
 +  configure data);
 +goto do_unmap;
 +}
 +
 +addr = reserved_range_push(nvdimm_size);
 +if (!addr) {
 +error_setg(errp, do not have enough space for size %#lx.\n, size);

error_setg() messages must not have a newline at the end.

Please use %# PRIx64 instead of %#lx so compilation works on 32-bit
hosts where sizeof(long) == 4.

 +goto do_unmap;
 +}
 +
 +nvdimm-device_index = new_device_index();
 +sprintf(name, NVDIMM-%d, nvdimm-device_index);
 +memory_region_init_ram_ptr(nvdimm-mr, OBJECT(dev), name, nvdimm_size,
 +   buf);

How is the autogenerated name used?

Why not just use pc-nvdimm.memory?

 +vmstate_register_ram(nvdimm-mr, DEVICE(dev));
 +memory_region_add_subregion(get_system_memory(), addr, nvdimm-mr);
 +
 +return;

fd is leaked.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM

2015-08-25 Thread Stefan Hajnoczi

On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
 NVDIMM reserves all the free range above 4G to do:
 - Persistent Memory (PMEM) mapping
 - implement NVDIMM ACPI device _DSM method
 
 Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
 ---
  hw/i386/pc.c   | 12 ++--
  hw/mem/nvdimm/pc-nvdimm.c  | 13 +
  include/hw/mem/pc-nvdimm.h |  1 +
  3 files changed, 24 insertions(+), 2 deletions(-)

CCing Igor for memory hotplug-related changes.

 diff --git a/hw/i386/pc.c b/hw/i386/pc.c
 index 7661ea9..41af6ea 100644
 --- a/hw/i386/pc.c
 +++ b/hw/i386/pc.c
 @@ -64,6 +64,7 @@
  #include hw/pci/pci_host.h
  #include acpi-build.h
  #include hw/mem/pc-dimm.h
 +#include hw/mem/pc-nvdimm.h
  #include qapi/visitor.h
  #include qapi-visit.h
  
 @@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
  MemoryRegion *ram_below_4g, *ram_above_4g;
  FWCfgState *fw_cfg;
  PCMachineState *pcms = PC_MACHINE(machine);
 +ram_addr_t offset;
  
  assert(machine-ram_size == below_4g_mem_size + above_4g_mem_size);
  
 @@ -1339,6 +1341,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
  exit(EXIT_FAILURE);
  }
  
 +offset = 0x1ULL + above_4g_mem_size;
 +
  /* initialize hotplug memory address space */
  if (guest_info-has_reserved_memory 
  (machine-ram_size  machine-maxram_size)) {
 @@ -1358,8 +1362,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
  exit(EXIT_FAILURE);
  }
  
 -pcms-hotplug_memory.base =
 -ROUND_UP(0x1ULL + above_4g_mem_size, 1ULL  30);
 +pcms-hotplug_memory.base = ROUND_UP(offset, 1ULL  30);
  
  if (pcms-enforce_aligned_dimm) {
  /* size hotplug region assuming 1G page max alignment per slot */
 @@ -1377,8 +1380,13 @@ FWCfgState *pc_memory_init(MachineState *machine,
 hotplug-memory, hotplug_mem_size);
  memory_region_add_subregion(system_memory, pcms-hotplug_memory.base,
  pcms-hotplug_memory.mr);
 +
 +offset = pcms-hotplug_memory.base + hotplug_mem_size;
  }
  
 + /* all the space left above 4G is reserved for NVDIMM. */
 +pc_nvdimm_reserve_range(offset);
 +
  /* Initialize PC system firmware */
  pc_system_firmware_init(rom_memory, guest_info-isapc_ram_fw);
  
 diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
 index a53d235..7a270a8 100644
 --- a/hw/mem/nvdimm/pc-nvdimm.c
 +++ b/hw/mem/nvdimm/pc-nvdimm.c
 @@ -24,6 +24,19 @@
  
  #include hw/mem/pc-nvdimm.h
  
 +#define PAGE_SIZE  (1UL  12)
 +
 +static struct nvdimms_info {
 +ram_addr_t current_addr;
 +} nvdimms_info;
 +
 +/* the address range [offset, ~0ULL) is reserved for NVDIMM. */
 +void pc_nvdimm_reserve_range(ram_addr_t offset)
 +{
 +offset = ROUND_UP(offset, PAGE_SIZE);
 +nvdimms_info.current_addr = offset;
 +}
 +
  static char *get_file(Object *obj, Error **errp)
  {
  PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
 diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
 index 51152b8..8601e9b 100644
 --- a/include/hw/mem/pc-nvdimm.h
 +++ b/include/hw/mem/pc-nvdimm.h
 @@ -28,4 +28,5 @@ typedef struct PCNVDIMMDevice {
  #define PC_NVDIMM(obj) \
  OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
  
 +void pc_nvdimm_reserve_range(ram_addr_t offset);
  #endif
 -- 
 2.4.3
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 3/4] irqchip: GIC: Convert to EOImode == 1

2015-08-25 Thread Thomas Gleixner

On Tue, 25 Aug 2015, Marc Zyngier wrote:
 +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
 +
  #ifndef MAX_GIC_NR
  #define MAX_GIC_NR   1
  #endif
 @@ -137,6 +140,14 @@ static inline unsigned int gic_irq(struct irq_data *d)
   return d-hwirq;
  }
  
 +static inline bool primary_gic_irq(struct irq_data *d)
 +{
 + if (MAX_GIC_NR  1)
 + return irq_data_get_irq_chip_data(d) == gic_data[0];
 +
 + return true;
 +}
 +
  /*
   * Routines to acknowledge, disable and enable interrupts
   */
 @@ -164,7 +175,14 @@ static void gic_unmask_irq(struct irq_data *d)
  
  static void gic_eoi_irq(struct irq_data *d)
  {
 - writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI);
 + u32 deact_offset = GIC_CPU_EOI;
 +
 + if (static_key_true(supports_deactivate)) {
 + if (primary_gic_irq(d))
 + deact_offset = GIC_CPU_DEACTIVATE;

I really wonder for the whole series whether you really want all that
static key dance and extra conditionals in the callbacks instead of
just using seperate irq chips for the different interrupts.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 12/15] KVM: arm64: sync LPI configuration and pending tables

2015-08-25 Thread Andre Przywara

Hi Eric,

On 14/08/15 13:35, Eric Auger wrote:
 On 08/14/2015 01:58 PM, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 The LPI configuration and pending tables of the GICv3 LPIs are held
 in tables in (guest) memory. To achieve reasonable performance, we
 cache this data in our own data structures, so we need to sync those
 two views from time to time. This behaviour is well described in the
 GICv3 spec and is also exercised by hardware, so the sync points are
 well known.

 Provide functions that read the guest memory and store the
 information from the configuration and pending tables in the kernel.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
 would help to have change log between v1 - v2 (valid for the whole series)
  include/kvm/arm_vgic.h  |   2 +
  virt/kvm/arm/its-emul.c | 124 
 
  virt/kvm/arm/its-emul.h |   3 ++
  3 files changed, 129 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 2a67a10..323c33a 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -167,6 +167,8 @@ struct vgic_its {
 int cwriter;
 struct list_headdevice_list;
 struct list_headcollection_list;
 +   /* memory used for buffering guest's memory */
 +   void*buffer_page;
  };
  
  struct vgic_dist {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index b9c40d7..05245cb 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -50,6 +50,7 @@ struct its_itte {
 struct its_collection *collection;
 u32 lpi;
 u32 event_id;
 +   u8 priority;
 bool enabled;
 unsigned long *pending;
  };
 @@ -70,8 +71,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm 
 *kvm, int lpi)
 return NULL;
  }
  
 +#define LPI_PROP_ENABLE_BIT(p) ((p)  LPI_PROP_ENABLED)
 +#define LPI_PROP_PRIORITY(p)   ((p)  0xfc)
 +
 +/* stores the priority and enable bit for a given LPI */
 +static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 
 prop)
 +{
 +   itte-priority = LPI_PROP_PRIORITY(prop);
 +   itte-enabled  = LPI_PROP_ENABLE_BIT(prop);
 +}
 +
 +#define GIC_LPI_OFFSET 8192
 +
 +/* We scan the table in chunks the size of the smallest page size */
 4kB chunks?
 +#define CHUNK_SIZE 4096U
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
  
 +static int nr_idbits_propbase(u64 propbaser)
 +{
 +   int nr_idbits = (1U  (propbaser  0x1f)) + 1;
 +
 +   return max(nr_idbits, INTERRUPT_ID_BITS_ITS);
 +}
 +
 +/*
 + * Scan the whole LPI configuration table and put the LPI configuration
 + * data in our own data structures. This relies on the LPI being
 + * mapped before.
 + */
 +static bool its_update_lpis_configuration(struct kvm *kvm)
 +{
 +   struct vgic_dist *dist = kvm-arch.vgic;
 +   u8 *prop = dist-its.buffer_page;
 +   u32 tsize;
 +   gpa_t propbase;
 +   int lpi = GIC_LPI_OFFSET;
 +   struct its_itte *itte;
 +   struct its_device *device;
 +   int ret;
 +
 +   propbase = BASER_BASE_ADDRESS(dist-propbaser);
 +   tsize = nr_idbits_propbase(dist-propbaser);
 +
 +   while (tsize  0) {
 +   int chunksize = min(tsize, CHUNK_SIZE);
 +
 +   ret = kvm_read_guest(kvm, propbase, prop, chunksize);
 I think you still have the spin_lock issue  since if my understanding is
 correct this is called from
 vgic_handle_mmio_access/vcall_range_handler/gic_enable_lpis
 where vgic_handle_mmio_access. Or does it take another path?

 Shouldn't we create a new kvm_io_device to avoid holding the dist lock?
 
 Sorry I forgot it was the case already. But currently we always register
 the same io ops (registration entry point being
 vgic_register_kvm_io_dev) and maybe we should have separate dispatcher
 function for dist, redit and its?

What would be the idea behind it? To have separate locks for each? I
don't think that will work, as some ITS functions are called from GICv3
register handler functions which manipulate members of the distributor
structure. So I am more in favour of dropping the dist lock in these
cases before handing off execution to ITS specific functions.

Cheers,
Andre.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 3/3] kvm: add tracepoint for fast mmio

2015-08-25 Thread Jason Wang



On 08/25/2015 07:34 PM, Michael S. Tsirkin wrote:
 On Tue, Aug 25, 2015 at 03:47:15PM +0800, Jason Wang wrote:
  Cc: Gleb Natapov g...@kernel.org
  Cc: Paolo Bonzini pbonz...@redhat.com
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
   arch/x86/kvm/trace.h | 17 +
   arch/x86/kvm/vmx.c   |  1 +
   arch/x86/kvm/x86.c   |  1 +
   3 files changed, 19 insertions(+)
  
  diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
  index 4eae7c3..2d4e81a 100644
  --- a/arch/x86/kvm/trace.h
  +++ b/arch/x86/kvm/trace.h
  @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio,
   __entry-count  1 ? (...) : )
   );
   
  +TRACE_EVENT(kvm_fast_mmio,
  +  TP_PROTO(u64 gpa),
  +  TP_ARGS(gpa),
  +
  +  TP_STRUCT__entry(
  +  __field(u64,gpa)
  +  ),
  +
  +  TP_fast_assign(
  +  __entry-gpa= gpa;
  +  ),
  +
  +  TP_printk(fast mmio at gpa 0x%llx, __entry-gpa)
  +);
  +
  +
  +
 don't add multiple empty lines please.


Ok
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang



On 08/25/2015 07:33 PM, Michael S. Tsirkin wrote:
 On Tue, Aug 25, 2015 at 03:47:14PM +0800, Jason Wang wrote:
  We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
  and another is KVM_FAST_MMIO_BUS. This leads to issue:
  
  - kvm_io_bus_destroy() knows nothing about the devices on two buses
points to a single dev. Which will lead double free [1] during exit.
  - wildcard eventfd ignores data len, so it was registered as a
kvm_io_range with zero length. This will fail the binary search in
kvm_io_bus_get_first_dev() when we try to emulate through
KVM_MMIO_BUS. This will cause userspace io emulation request instead
of a eventfd notification (virtqueue kick will be trapped by qemu
instead of vhost in this case).
  
  Fixing this by don't register wildcard mmio eventfd on two
  buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
  double free issue of kvm_io_bus_destroy(). For the arch/setups that
  does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
  KVM_FAST_MMIO_BUS first to see it it has a match.
  
  [1] Panic caused by double free:
  
  CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
  #28-Ubuntu
  Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
  task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
  RIP: 0010:[c07e25d8]  [c07e25d8] 
  ioeventfd_release+0x28/0x60 [kvm]
  RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
  RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
  RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
  RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
  R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
  R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
  FS:  7fc1ee3e6700() GS:88023e24() 
  knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
  Stack:
  88021e7cc000  88020e7f3be8 c07e2622
  88020e7f3c38 c07df69a 880232524160 88020e792d80
    880219b78c00 0008 8802321686a8
  Call Trace:
  [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
  [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
  [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
  [811f69f7] __fput+0xe7/0x250
  [811f6bae] fput+0xe/0x10
  [81093f04] task_work_run+0xd4/0xf0
  [81079358] do_exit+0x368/0xa50
  [81082c8f] ? recalc_sigpending+0x1f/0x60
  [81079ad5] do_group_exit+0x45/0xb0
  [81085c71] get_signal+0x291/0x750
  [810144d8] do_signal+0x28/0xab0
  [810f3a3b] ? do_futex+0xdb/0x5d0
  [810b7028] ? __wake_up_locked_key+0x18/0x20
  [810f3fa6] ? SyS_futex+0x76/0x170
  [81014fc9] do_notify_resume+0x69/0xb0
  [817cb9af] int_signal+0x12/0x17
  Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 
  20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 
  b8 00 01 10 00 00
  RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
  RSP 88020e7f3bc8
  
  Cc: Gleb Natapov g...@kernel.org
  Cc: Paolo Bonzini pbonz...@redhat.com
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
 I'm worried that this slows down the regular MMIO.

I doubt whether or not it was measurable.

 Could you share performance #s please?
 You need a mix of len=0 and len=2 matches.

Ok.

 One solution for the first issue is to create two ioeventfd objects instead.

Sounds good.

 For the second issue, we could change bsearch compare function instead.

What do you mean by second issue ?

 Again, affects all devices to performance #s would be needed.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Joe Perches

On Wed, 2015-08-26 at 13:39 +0800, Jason Wang wrote:
 
 On 08/25/2015 11:29 PM, Joe Perches wrote:
  On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
   All fields of kvm_io_range were initialized or copied explicitly
   afterwards. So switch to use kmalloc().
  Is there any compiler added alignment padding
  in either structure?  If so, those padding
  areas would now be uninitialized and may leak
  kernel data if copied to user-space.
 
 I get your concern, but I don't a way to copy them to userspace, did you?

I didn't look.

I just wanted you to be aware there's a difference
and a reason why kzalloc might be used even though
all structure members are initialized.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Jason Wang



On 08/26/2015 01:45 PM, Joe Perches wrote:
 On Wed, 2015-08-26 at 13:39 +0800, Jason Wang wrote:
  
  On 08/25/2015 11:29 PM, Joe Perches wrote:
   On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
All fields of kvm_io_range were initialized or copied explicitly
afterwards. So switch to use kmalloc().
   Is there any compiler added alignment padding
   in either structure?  If so, those padding
   areas would now be uninitialized and may leak
   kernel data if copied to user-space.
  
  I get your concern, but I don't a way to copy them to userspace, did you?
 I didn't look.

 I just wanted you to be aware there's a difference
 and a reason why kzalloc might be used even though
 all structure members are initialized.


I see, thanks for the reminding. Looks like we are safe and I will add
something like kvm_io_range was never accessed by userspace in the
commit log if there's a new version.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang



On 08/25/2015 07:51 PM, Michael S. Tsirkin wrote:
 On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote:
  We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
  and another is KVM_FAST_MMIO_BUS. This leads to issue:
  
  - kvm_io_bus_destroy() knows nothing about the devices on two buses
points to a single dev. Which will lead double free [1] during exit.
  - wildcard eventfd ignores data len, so it was registered as a
kvm_io_range with zero length. This will fail the binary search in
kvm_io_bus_get_first_dev() when we try to emulate through
KVM_MMIO_BUS. This will cause userspace io emulation request instead
of a eventfd notification (virtqueue kick will be trapped by qemu
instead of vhost in this case).
  
  Fixing this by don't register wildcard mmio eventfd on two
  buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
  double free issue of kvm_io_bus_destroy(). For the arch/setups that
  does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
  KVM_FAST_MMIO_BUS first to see it it has a match.
  
  [1] Panic caused by double free:
  
  CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
  #28-Ubuntu
  Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
  task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
  RIP: 0010:[c07e25d8]  [c07e25d8] 
  ioeventfd_release+0x28/0x60 [kvm]
  RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
  RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
  RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
  RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
  R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
  R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
  FS:  7fc1ee3e6700() GS:88023e24() 
  knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
  Stack:
  88021e7cc000  88020e7f3be8 c07e2622
  88020e7f3c38 c07df69a 880232524160 88020e792d80
    880219b78c00 0008 8802321686a8
  Call Trace:
  [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
  [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
  [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
  [811f69f7] __fput+0xe7/0x250
  [811f6bae] fput+0xe/0x10
  [81093f04] task_work_run+0xd4/0xf0
  [81079358] do_exit+0x368/0xa50
  [81082c8f] ? recalc_sigpending+0x1f/0x60
  [81079ad5] do_group_exit+0x45/0xb0
  [81085c71] get_signal+0x291/0x750
  [810144d8] do_signal+0x28/0xab0
  [810f3a3b] ? do_futex+0xdb/0x5d0
  [810b7028] ? __wake_up_locked_key+0x18/0x20
  [810f3fa6] ? SyS_futex+0x76/0x170
  [81014fc9] do_notify_resume+0x69/0xb0
  [817cb9af] int_signal+0x12/0x17
  Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 
  20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 
  b8 00 01 10 00 00
  RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
  RSP 88020e7f3bc8
  
  Cc: Gleb Natapov g...@kernel.org
  Cc: Paolo Bonzini pbonz...@redhat.com
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
  Changes from V2:
  - Tweak styles and comment suggested by Cornelia.
  Changes from v1:
  - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
needed to save lots of unnecessary changes.
  ---
   virt/kvm/eventfd.c  | 31 +--
   virt/kvm/kvm_main.c | 16 ++--
   2 files changed, 23 insertions(+), 24 deletions(-)
  
  diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
  index 9ff4193..c3ffdc3 100644
  --- a/virt/kvm/eventfd.c
  +++ b/virt/kvm/eventfd.c
  @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
  _ioeventfd *p)
 return false;
   }
   
  -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
  +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args)
   {
  -  if (flags  KVM_IOEVENTFD_FLAG_PIO)
  +  if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
 return KVM_PIO_BUS;
  -  if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
  +  if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 return KVM_VIRTIO_CCW_NOTIFY_BUS;
  -  return KVM_MMIO_BUS;
  +  /* When length is ignored, MMIO is put on a separate bus, for
  +   * faster lookups.
  +   */
  +  return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
   }
   
   static int
  @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct 
  kvm_ioeventfd *args)
 struct eventfd_ctx   *eventfd;
 int   ret;
   
  -  bus_idx = ioeventfd_bus_from_flags(args-flags);
  +  bus_idx = ioeventfd_bus_from_args(args);
 /* must be

Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Jason Wang



On 08/25/2015 11:29 PM, Joe Perches wrote:
 On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
  All fields of kvm_io_range were initialized or copied explicitly
  afterwards. So switch to use kmalloc().
 Is there any compiler added alignment padding
 in either structure?  If so, those padding
 areas would now be uninitialized and may leak
 kernel data if copied to user-space.


I get your concern, but I don't a way to copy them to userspace, did you?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang



On 08/25/2015 11:04 AM, Jason Wang wrote:
[...]
 @@ -900,10 +899,11 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct 
 kvm_ioeventfd *args)
   if (!p-wildcard  p-datamatch != args-datamatch)
   continue;
   
  -kvm_io_bus_unregister_dev(kvm, bus_idx, p-dev);
   if (!p-length) {
   kvm_io_bus_unregister_dev(kvm, 
  KVM_FAST_MMIO_BUS,
 p-dev);
  +} else {
  +kvm_io_bus_unregister_dev(kvm, bus_idx, 
  p-dev);
   }
  Similar comments here... do you want to check for bus_idx ==
  KVM_MMIO_BUS as well?
  Good catch. I think keep the original code as is will be also ok to
  solve this. (with changing the bus_idx to KVM_FAST_MMIO_BUS during
  registering if it was an wildcard mmio).
  Do you need to handle the ioeventfd_count changes on the fast mmio bus
  as well?
 Yes. So actually, it needs some changes: checking the return value of
 kvm_io_bus_unregister_dev() and decide which bus does the device belongs to.


Looks like it will be more cleaner by just changing
ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS accordingly. Will
post V2 soon.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Cornelia Huck

On Tue, 25 Aug 2015 15:47:14 +0800
Jason Wang jasow...@redhat.com wrote:

 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 9ff4193..95f2901 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
 _ioeventfd *p)
   return false;
  }
 
 -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
 +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args)

ioeventfd_bus_from_args()? But _from_flags() is not wrong either :)

  {
 - if (flags  KVM_IOEVENTFD_FLAG_PIO)
 + if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
   return KVM_PIO_BUS;
 - if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 + if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
   return KVM_VIRTIO_CCW_NOTIFY_BUS;
 - return KVM_MMIO_BUS;
 + if (args-len)
 + return KVM_MMIO_BUS;
 + return KVM_FAST_MMIO_BUS;

Hm...

/* When length is ignored, MMIO is put on a separate bus, for
 * faster lookups.
 */
return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;

  }
 
  static int

This version of the patch looks nice and compact. Regardless whether
you want to follow my (minor) style suggestions, consider this patch

Acked-by: Cornelia Huck cornelia.h...@de.ibm.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

2015-08-25 Thread Jason Wang

All fields of kvm_io_range were initialized or copied explicitly
afterwards. So switch to use kmalloc().

Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 virt/kvm/kvm_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..0d79fe8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3248,7 +3248,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus 
bus_idx, gpa_t addr,
if (bus-dev_count - bus-ioeventfd_count  NR_IOBUS_DEVS - 1)
return -ENOSPC;
 
-   new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count + 1) *
+   new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count + 1) *
  sizeof(struct kvm_io_range)), GFP_KERNEL);
if (!new_bus)
return -ENOMEM;
@@ -3280,7 +3280,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum 
kvm_bus bus_idx,
if (r)
return r;
 
-   new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count - 1) *
+   new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count - 1) *
  sizeof(struct kvm_io_range)), GFP_KERNEL);
if (!new_bus)
return -ENOMEM;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V2 3/3] kvm: add tracepoint for fast mmio

2015-08-25 Thread Jason Wang

Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 arch/x86/kvm/trace.h | 17 +
 arch/x86/kvm/vmx.c   |  1 +
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 19 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 4eae7c3..2d4e81a 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio,
  __entry-count  1 ? (...) : )
 );
 
+TRACE_EVENT(kvm_fast_mmio,
+   TP_PROTO(u64 gpa),
+   TP_ARGS(gpa),
+
+   TP_STRUCT__entry(
+   __field(u64,gpa)
+   ),
+
+   TP_fast_assign(
+   __entry-gpa= gpa;
+   ),
+
+   TP_printk(fast mmio at gpa 0x%llx, __entry-gpa)
+);
+
+
+
 /*
  * Tracepoint for cpuid.
  */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 83b7b5c..a55d279 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
skip_emulated_instruction(vcpu);
+   trace_kvm_fast_mmio(gpa);
return 1;
}
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f0f6ec..36cf78e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
 EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang

We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
and another is KVM_FAST_MMIO_BUS. This leads to issue:

- kvm_io_bus_destroy() knows nothing about the devices on two buses
  points to a single dev. Which will lead double free [1] during exit.
- wildcard eventfd ignores data len, so it was registered as a
  kvm_io_range with zero length. This will fail the binary search in
  kvm_io_bus_get_first_dev() when we try to emulate through
  KVM_MMIO_BUS. This will cause userspace io emulation request instead
  of a eventfd notification (virtqueue kick will be trapped by qemu
  instead of vhost in this case).

Fixing this by don't register wildcard mmio eventfd on two
buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
double free issue of kvm_io_bus_destroy(). For the arch/setups that
does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
KVM_FAST_MMIO_BUS first to see it it has a match.

[1] Panic caused by double free:

CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
RIP: 0010:[c07e25d8]  [c07e25d8] 
ioeventfd_release+0x28/0x60 [kvm]
RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
FS:  7fc1ee3e6700() GS:88023e24() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
Stack:
88021e7cc000  88020e7f3be8 c07e2622
88020e7f3c38 c07df69a 880232524160 88020e792d80
  880219b78c00 0008 8802321686a8
Call Trace:
[c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
[c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
[c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
[811f69f7] __fput+0xe7/0x250
[811f6bae] fput+0xe/0x10
[81093f04] task_work_run+0xd4/0xf0
[81079358] do_exit+0x368/0xa50
[81082c8f] ? recalc_sigpending+0x1f/0x60
[81079ad5] do_group_exit+0x45/0xb0
[81085c71] get_signal+0x291/0x750
[810144d8] do_signal+0x28/0xab0
[810f3a3b] ? do_futex+0xdb/0x5d0
[810b7028] ? __wake_up_locked_key+0x18/0x20
[810f3fa6] ? SyS_futex+0x76/0x170
[81014fc9] do_notify_resume+0x69/0xb0
[817cb9af] int_signal+0x12/0x17
Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 
10 00 00
RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
RSP 88020e7f3bc8

Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
Changes from v1:
- change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
  needed to save lots of unnecessary changes.
---
 virt/kvm/eventfd.c  | 30 --
 virt/kvm/kvm_main.c | 16 ++--
 2 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..95f2901 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
_ioeventfd *p)
return false;
 }
 
-static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
+static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args)
 {
-   if (flags  KVM_IOEVENTFD_FLAG_PIO)
+   if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
return KVM_PIO_BUS;
-   if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
+   if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
return KVM_VIRTIO_CCW_NOTIFY_BUS;
-   return KVM_MMIO_BUS;
+   if (args-len)
+   return KVM_MMIO_BUS;
+   return KVM_FAST_MMIO_BUS;
 }
 
 static int
@@ -779,7 +781,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args)
struct eventfd_ctx   *eventfd;
int   ret;
 
-   bus_idx = ioeventfd_bus_from_flags(args-flags);
+   bus_idx = ioeventfd_bus_from_flags(args);
/* must be natural-word sized, or 0 to ignore length */
switch (args-len) {
case 0:
@@ -843,16 +845,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args)
if (ret  0)
goto unlock_fail;
 
-   /* When length is ignored, MMIO is also put on a separate bus, for
-* faster lookups.
-*/
-   if (!args-len  !(args-flags

77 matches

Mail list logo