Re: [PATCH v6 6/6] KVM: Use eoi to track RTC interrupt delivery status

2013-03-24 Thread Gleb Natapov
On Fri, Mar 22, 2013 at 10:50:55AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-03-22:
  On Fri, Mar 22, 2013 at 08:51:47AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-03-22:
  On Fri, Mar 22, 2013 at 08:37:22AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-03-22:
  On Fri, Mar 22, 2013 at 08:25:21AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-03-22:
  On Fri, Mar 22, 2013 at 08:05:27AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-03-22:
  On Fri, Mar 22, 2013 at 01:24:05PM +0800, Yang Zhang wrote:
  From: Yang Zhang yang.z.zh...@intel.com
  
  Current interrupt coalescing logci which only used by RTC has
  conflict with Posted Interrupt. This patch introduces a new
  mechinism to use eoi to track interrupt: When delivering an
  interrupt to vcpu, the pending_eoi set to number of vcpu that
  received the interrupt. And decrease it when each vcpu writing
  eoi. No subsequent RTC interrupt can deliver to vcpu until all
  vcpus write eoi.
  
  Signed-off-by: Yang Zhang yang.z.zh...@intel.com
  ---
   virt/kvm/ioapic.c |   40
   +++- 1 files changed, 39
   insertions(+), 1 deletions(-)
  diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
  index c991e58..df16daf 100644
  --- a/virt/kvm/ioapic.c
  +++ b/virt/kvm/ioapic.c
  @@ -114,6 +114,29 @@ static void rtc_irq_restore(struct kvm_ioapic
  *ioapic)
 ioapic-rtc_status.pending_eoi = pending_eoi;
   }
  +static void rtc_irq_ack_eoi(struct kvm_vcpu *vcpu,
  +  struct rtc_status *rtc_status, int irq)
  +{
  +  if (irq != RTC_GSI)
  +  return;
  +
  +  if (test_and_clear_bit(vcpu-vcpu_id, rtc_status-dest_map))
  +  --rtc_status-pending_eoi;
  +
  +  WARN_ON(rtc_status-pending_eoi  0);
  +}
  +
  +static bool rtc_irq_check(struct kvm_ioapic *ioapic, int irq)
  +{
  +  if (irq != RTC_GSI)
  +  return false;
  +
  +  if (ioapic-rtc_status.pending_eoi  0)
  +  return true; /* coalesced */
  +
  +  return false;
  +}
  +
   static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int 
  idx)
   {
 union kvm_ioapic_redirect_entry *pent;
  @@ -229,6 +252,7 @@ static int ioapic_deliver(struct kvm_ioapic
  *ioapic,
  int
  irq)
   {
 union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
 struct kvm_lapic_irq irqe;
  +  int ret;
  
 ioapic_debug(dest=%x dest_mode=%x delivery_mode=%x 
  vector=%x trig_mode=%x\n,
  @@ -244,7 +268,14 @@ static int ioapic_deliver(struct kvm_ioapic
  *ioapic,
  int
  irq)
 irqe.level = 1;
 irqe.shorthand = 0;
  -  return kvm_irq_delivery_to_apic(ioapic-kvm, NULL, irqe,
  NULL); +   if (irq == RTC_GSI) { + ret =
  kvm_irq_delivery_to_apic(ioapic-kvm, NULL, irqe,
  +  ioapic-rtc_status.dest_map);
  +  ioapic-rtc_status.pending_eoi = ret;
  We should track status only if IRQ_STATUS ioctl was used to inject 
  an
  interrupt.
  We already know RTC will use IRQ_STATUS ioctl. Why check it again?
  
  QEMU does. QEMU is not the only userspace.
  And this will break other userspace.
  
  How?
  If other userspace has the reinjection logic for RTC, but it not uses
  IRQ_STATUS,
  then it cannot get the right coalescing info. If it also use IRQ_STATUS 
  to get
  coalescing info, then we don't need the IRQ_STATUS check.
  
  If userspace does not care about irq status it does not use IRQ_STATUS
  ioctl and we should not go extra mile to provide one. Not everyone cares
  about running Windows as a guest.
  I see your point. But if no windows guest running, RTC is hardly used
  by other guests and the overheard can be ignore.
  
  Anyone can use RTC is Linux guest. Don't know about others.
 I see.
 Since pass IRQ_STATUS to ioapic need to change many functions, how about add 
 a variable in rtc_status:
 struct rtc_status {
   bool IRQ_STATUS
 };
 
 And set it in kvm_vm_ioctl():
 case KVM_IRQ_LINE_STATUS:
   if(irq == RTC_GSI  ioapic)
   ioapic-rtc_status.IRQ_STATUS = true;
 
I do not like that kind of hacks. Just put functions change in separate,
easy to review patch.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 0/7] ppc patch queue 2013-03-22

2013-03-24 Thread Gleb Natapov
On Fri, Mar 22, 2013 at 03:25:56PM +0100, Alexander Graf wrote:
 Hi Marcelo / Gleb,
 
 This is my current patch queue for ppc.  Please pull.
 
 Changes include:
 
   - race-free in-kernel watchdog handling api
   - e500 mmu fixes
 
 Alex
 
Pulled, thanks.

 
 The following changes since commit 2ae33b389601b86a3d0cfe2d09f5e3189d5322fd:
   Marcelo Tosatti (1):
 Merge remote-tracking branch 'upstream/master' into queue
 
 are available in the git repository at:
 
   git://github.com/agraf/linux-2.6.git kvm-ppc-next
 
 Bharat Bhushan (3):
   KVM: PPC: move tsr update in a separate function
   KVM: PPC: Added one_reg interface for timer registers
   KVM: PPC: booke: Added debug handler
 
 Paul Mackerras (1):
   KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external
 
 Scott Wood (3):
   kvm/ppc/e500: h2g_tlb1_rmap: esel 0 is valid
   kvm/ppc/e500: g2h_tlb1_map: clear old bit before setting new bit
   kvm/ppc/e500: eliminate tlb_refs
 
  Documentation/virtual/kvm/api.txt   |4 ++
  arch/powerpc/include/asm/kvm_host.h |1 +
  arch/powerpc/include/asm/kvm_ppc.h  |3 +-
  arch/powerpc/include/uapi/asm/kvm.h |5 ++
  arch/powerpc/kernel/asm-offsets.c   |1 +
  arch/powerpc/kvm/book3s.c   |3 +-
  arch/powerpc/kvm/booke.c|   57 +++-
  arch/powerpc/kvm/booke_interrupts.S |   42 -
  arch/powerpc/kvm/e500.h |   24 +++---
  arch/powerpc/kvm/e500_mmu_host.c|   84 +++---
  arch/powerpc/kvm/powerpc.c  |2 +-
  11 files changed, 133 insertions(+), 93 deletions(-)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: allow host header to be included even for !CONFIG_KVM

2013-03-24 Thread Gleb Natapov
On Thu, Mar 21, 2013 at 05:02:15PM -0700, Kevin Hilman wrote:
 Gleb Natapov g...@redhat.com writes:
 
  On Thu, Mar 21, 2013 at 02:33:13PM -0500, Scott Wood wrote:
  On 03/21/2013 02:16:00 PM, Gleb Natapov wrote:
  On Thu, Mar 21, 2013 at 01:42:34PM -0500, Scott Wood wrote:
   On 03/21/2013 09:27:14 AM, Kevin Hilman wrote:
   Gleb Natapov g...@redhat.com writes:
   
On Wed, Mar 20, 2013 at 06:58:41PM -0500, Scott Wood wrote:
Why can't the entirety kvm_host.h be included regardless of
CONFIG_KVM, just like most other feature-specific headers?  Why
can't the if/else just go around the functions that you want to
   stub
out for non-KVM builds?
   
Kevin,
   
 What compilation failure this patch fixes? I presume
  something ARM
related.
   
   Not specficially ARM related, but more context tracking related
  since
   kernel/context_tracking.c pulls in kvm_host.h, which attempts to
   pull in
   asm/kvm*.h which may not exist on some platforms.
   
   At least for ARM, KVM support was added in v3.9 so this patch can
   probably be dropped since the non-KVM builds on ARM now work.
  But any
   platform without the asm/kvm*.h will still be broken when
  trying to
   build the context tracker.
  
   Maybe other platforms should get empty asm/kvm*.h files.  Is there
   anything from those files that the linux/kvm*.h headers need to
   build?
  
  arch things. kvm_vcpu_arch, kvm_arch_memory_slot, kvm_arch etc.
  
  Could define them as empty structs.
  
  Isn't is simpler for kernel/context_tracking.c to define empty
  __guest_enter()/__guest_exit() if !CONFIG_KVM.
 
 I proposed something like that in an earlier version but Frederic asked
 me to propose a fix to the KVM headers instead.
 
 Just in case fixing the context tracking subsystem is preferred, 
 the patch below fixes the problem also.
 
The patch that broke PPC build was reverted. Frederic can you get the
patch below?

 Kevin
 
 From f22995a262144d0d61705fa72134694d911283eb Mon Sep 17 00:00:00 2001
 From: Kevin Hilman khil...@linaro.org
 Date: Thu, 21 Mar 2013 16:57:14 -0700
 Subject: [PATCH] context_tracking: fix !CONFIG_KVM compile: add stub guest
  enter/exit
 
 When KVM is not enabled, or not available on a platform, the KVM
 headers should not be included.  Instead, just define stub
 __guest_[enter|exit] functions.
 
 Cc: Frederic Weisbecker fweis...@gmail.com
 Signed-off-by: Kevin Hilman khil...@linaro.org
 ---
  kernel/context_tracking.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)
 
 diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
 index 65349f0..64b0f80 100644
 --- a/kernel/context_tracking.c
 +++ b/kernel/context_tracking.c
 @@ -15,12 +15,18 @@
   */
  
  #include linux/context_tracking.h
 -#include linux/kvm_host.h
  #include linux/rcupdate.h
  #include linux/sched.h
  #include linux/hardirq.h
  #include linux/export.h
  
 +#if IS_ENABLED(CONFIG_KVM)
 +#include linux/kvm_host.h
 +#else
 +#define __guest_enter()
 +#define __guest_exit()
 +#endif
 +
  DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
  #ifdef CONFIG_CONTEXT_TRACKING_FORCE
   .active = true,
 -- 
 1.8.2

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: x86: Avoid busy loops over uninjectable pending APIC timers

2013-03-24 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-03-22:
 On Fri, Mar 22, 2013 at 07:43:03AM -0300, Marcelo Tosatti wrote:
 On Fri, Mar 22, 2013 at 08:53:15AM +0200, Gleb Natapov wrote:
 On Thu, Mar 21, 2013 at 08:06:41PM -0300, Marcelo Tosatti wrote:
 On Thu, Mar 21, 2013 at 11:13:39PM +0200, Gleb Natapov wrote:
 On Thu, Mar 21, 2013 at 05:51:50PM -0300, Marcelo Tosatti wrote:
 But current PI patches do break them, thats my point. So we
 either need to revise them again, or drop LAPIC timer
 reinjection. Making apic_accept_irq semantics it returns
 coalescing info, but only sometimes is dubious though.
 We may rollback to the initial idea: test both irr and pir to get
 coalescing info. In this case, inject LAPIC timer always in vcpu context. So
 apic_accept_irq() will return right coalescing info.
 Also, we need to add comments to tell caller, apic_accept_irq()
 can ensure the return value is correct only when caller is in
 target vcpu context.
 
 We cannot touch irr while vcpu is in non-root operation, so we
 will have to pass flag to apic_accept_irq() to let it know that it
 is called synchronously. While all this is possible I want to know
 which guests exactly will we break if we will not track interrupt
 coalescing for lapic timer. If only 2.0 smp kernels will break we
 can probably drop it.
 
 RHEL4 / RHEL5 guests.
 RHEL5 has kvmclock no? We should not break RHEL4 though.
 
 kvmclock provides no timer interrupt... either LAPIC or PIT must be used
 with kvmclock.
 I am confused now. If LAPIC is not used for wallclock time keeping, but
 only for scheduling the reinjection is actually harmful. Reinjecting the
 interrupt will cause needles task rescheduling. So the question is if
 there is a Linux kernel that uses LAPIC for wallclock time keeping and
 relies on accurate number of injected interrupts to not time drift.
 
 See 4acd47cfea9c18134e0cbf915780892ef0ff433a on RHEL5, RHEL5 kernels
 before that commit did not reinject.  Which means that all non-RHEL
 Linux guests based on that upstream code also suffer from the same
 problem.
 
 The commit actually fixes guest, not host. The existence of the commit
 also means that LAPIC timer reinjection does not solve the problem and
 all guests without this commit will suffer from the bug regardless of
 what we will decide to do here. Without LAPIC timer reinfection the
 effect of the bug will be much more visible and long lasting though.
 
 Also any other algorithm which uses LAPIC timers and compare that with
 other clocks (such as NMI watchdog) are potentially vulnerable.
 They are with or without timer reinjection as commit you pointed to
 shows.
 
 
 Can drop it, and then wait until someone complains (if so).
 
 Yes, tough decision to make. All the complains will be guest bugs which
 can be hit without reinjection too, but with less probability. Why we so
 keen on keeping RTC reinject is that the guests that depends on it
 cannot be fixed.
 
 Knowing that Linux tend to disable interrupt it is likely that it tries
 to detect and compensate for missing interrupt.
 
 As said above, any algorithm which compares LAPIC timer interrupt with
 another clock is vulnerable.
Any conclusion? 

Best regards,
Yang

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: allow host header to be included even for !CONFIG_KVM

2013-03-24 Thread Frederic Weisbecker
2013/3/21 Gleb Natapov g...@redhat.com:
 Isn't is simpler for kernel/context_tracking.c to define empty
 __guest_enter()/__guest_exit() if !CONFIG_KVM.

That doesn't look right. Off-cases are usually handled from the
headers, right? So that we avoid iffdeffery ugliness in core code.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: allow host header to be included even for !CONFIG_KVM

2013-03-24 Thread Gleb Natapov
On Sun, Mar 24, 2013 at 02:44:26PM +0100, Frederic Weisbecker wrote:
 2013/3/21 Gleb Natapov g...@redhat.com:
  Isn't is simpler for kernel/context_tracking.c to define empty
  __guest_enter()/__guest_exit() if !CONFIG_KVM.
 
 That doesn't look right. Off-cases are usually handled from the
 headers, right? So that we avoid iffdeffery ugliness in core code.
Lets put it in linux/context_tracking.h header then.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/3] tcm_vhost: Wait for pending requests in vhost_scsi_flush()

2013-03-24 Thread Michael S. Tsirkin
On Fri, Mar 22, 2013 at 02:55:27PM +0800, Asias He wrote:
 This patch makes vhost_scsi_flush() wait for all the pending requests
 issued before the flush operation to be finished.
 
 Signed-off-by: Asias He as...@redhat.com
 ---
  drivers/vhost/tcm_vhost.c | 117 
 ++
  drivers/vhost/tcm_vhost.h |   4 ++
  2 files changed, 101 insertions(+), 20 deletions(-)
 
 diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
 index e734ead..dc0af52 100644
 --- a/drivers/vhost/tcm_vhost.c
 +++ b/drivers/vhost/tcm_vhost.c
 @@ -82,6 +82,15 @@ struct vhost_scsi {
  
   bool vs_events_dropped; /* any missed events, protected by dev.mutex */
   u64 vs_events_nr; /* num of pending events, protected by dev.mutex */
 +
 + /*
 +  * vs_inflight[0]/[1] are used to track requests issued
 +  * before/during the flush operation
 +  */
 + u64 vs_inflight[2];
 + wait_queue_head_t vs_flush_wait; /* wait queue for flush operation */
 + spinlock_t vs_flush_lock; /* lock to protect vs_during_flush */
 + int vs_during_flush; /* flag to indicate if we are in flush operation */

So this adds a spinlock on data path and I'm not sure
I understand why this is correct (see also comment below).
And generally we should try to avoid reimplementing refcounting.  How
about we use a kref pointer instead?  Access can use RCU (or maybe put
it in vq-private and use the tricky vhost version of RCU).  Initialize
it to 1.  To flush you replace the pointer, decrement then wait for
refcount to reach 0.

This still adds atomics on data path so maybe worth benchmarking to
verify performance overhead is not measureable, but at least it's
one atomic and not a full lock.
Hmm?


  };
  
  /* Local pointer to allocated TCM configfs fabric module */
 @@ -99,6 +108,46 @@ static int iov_num_pages(struct iovec *iov)
  ((unsigned long)iov-iov_base  PAGE_MASK))  PAGE_SHIFT;
  }
  
 +static int tcm_vhost_inc_inflight(struct vhost_scsi *vs)
 +{
 + int during_flush;
 +
 + spin_lock(vs-vs_flush_lock);
 + during_flush = vs-vs_during_flush;
 + vs-vs_inflight[during_flush]++;
 + spin_unlock(vs-vs_flush_lock);
 +
 + return during_flush;
 +}
 +
 +static void tcm_vhost_dec_inflight(struct vhost_scsi *vs, int during_flush)
 +{
 + u64 inflight;
 +
 + spin_lock(vs-vs_flush_lock);
 + inflight = vs-vs_inflight[during_flush]--;
 + /*
 +  * Wakeup the waiter when all the requests issued before the flush
 +  * operation are finished and we are during the flush operation.
 +  */
 + if (!inflight  !during_flush  vs-vs_during_flush)
 + wake_up(vs-vs_flush_wait);
 + spin_unlock(vs-vs_flush_lock);
 +}
 +
 +static bool tcm_vhost_done_inflight(struct vhost_scsi *vs)
 +{
 + bool ret = false;
 +
 + /* The requests issued before the flush operation are finished ? */
 + spin_lock(vs-vs_flush_lock);
 + if (!vs-vs_inflight[0])
 + ret = true;
 + spin_unlock(vs-vs_flush_lock);
 +
 + return ret;
 +}
 +
  static bool tcm_vhost_check_feature(struct vhost_scsi *vs, int feature)
  {
   bool ret = false;
 @@ -391,6 +440,8 @@ static int tcm_vhost_queue_tm_rsp(struct se_cmd *se_cmd)
  
  static void tcm_vhost_free_evt(struct vhost_scsi *vs, struct tcm_vhost_evt 
 *evt)
  {
 +
 + tcm_vhost_dec_inflight(vs, evt-during_flush);
   mutex_lock(vs-dev.mutex);
   vs-vs_events_nr--;
   kfree(evt);
 @@ -412,6 +463,7 @@ static struct tcm_vhost_evt 
 *tcm_vhost_allocate_evt(struct vhost_scsi *vs,
   if (evt) {
   evt-event.event = event;
   evt-event.reason = reason;
 + evt-during_flush = tcm_vhost_inc_inflight(vs);
   vs-vs_events_nr++;
   }
   mutex_unlock(vs-dev.mutex);
 @@ -422,6 +474,7 @@ static struct tcm_vhost_evt 
 *tcm_vhost_allocate_evt(struct vhost_scsi *vs,
  static void vhost_scsi_free_cmd(struct tcm_vhost_cmd *tv_cmd)
  {
   struct se_cmd *se_cmd = tv_cmd-tvc_se_cmd;
 + struct vhost_scsi *vs = tv_cmd-tvc_vhost;
  
   /* TODO locking against target/backend threads? */
   transport_generic_free_cmd(se_cmd, 1);
 @@ -434,13 +487,16 @@ static void vhost_scsi_free_cmd(struct tcm_vhost_cmd 
 *tv_cmd)
   kfree(tv_cmd-tvc_sgl);
   }
  
 + tcm_vhost_dec_inflight(vs, tv_cmd-during_flush);
 +
   kfree(tv_cmd);
  }
  
  static void tcm_vhost_do_evt_work(struct vhost_scsi *vs,
 - struct virtio_scsi_event *event)
 + struct tcm_vhost_evt *evt)
  {
   struct vhost_virtqueue *vq = vs-vqs[VHOST_SCSI_VQ_EVT];
 + struct virtio_scsi_event *event = evt-event;
   struct virtio_scsi_event __user *eventp;
   unsigned out, in;
   int head, ret;
 @@ -503,7 +559,7 @@ static void tcm_vhost_evt_work(struct vhost_work *work)
   while (llnode) {
   evt = llist_entry(llnode, struct tcm_vhost_evt, list);
   llnode = llist_next(llnode);
 - 

Re: [PATCH V4 2/2] tcm_vhost: Add hotplug/hotunplug support

2013-03-24 Thread Michael S. Tsirkin
On Fri, Mar 22, 2013 at 01:39:05PM +0800, Asias He wrote:
 In commit 365a7150094 ([SCSI] virtio-scsi: hotplug support for
 virtio-scsi), hotplug support is added to virtio-scsi.
 
 This patch adds hotplug and hotunplug support to tcm_vhost.
 
 You can create or delete a LUN in targetcli to hotplug or hotunplug a
 LUN in guest.
 
 Changes in v4:
 - Drop tcm_vhost_check_endpoint in tcm_vhost_send_evt
 - Add tcm_vhost_check_endpoint in vhost_scsi_evt_handle_kick
 
 Changes in v3:
 - Separate the bug fix to another thread
 
 Changes in v2:
 - Remove code duplication in tcm_vhost_{hotplug,hotunplug}
 - Fix racing of vs_events_nr
 - Add flush fix patch to this series
 
 Signed-off-by: Asias He as...@redhat.com
 Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
 ---
  drivers/vhost/tcm_vhost.c | 212 
 --
  drivers/vhost/tcm_vhost.h |  10 +++
  2 files changed, 217 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
 index d81e3a9..e734ead 100644
 --- a/drivers/vhost/tcm_vhost.c
 +++ b/drivers/vhost/tcm_vhost.c
 @@ -62,6 +62,9 @@ enum {
  
  #define VHOST_SCSI_MAX_TARGET256
  #define VHOST_SCSI_MAX_VQ128
 +#define VHOST_SCSI_MAX_EVENT 128
 +
 +#define VHOST_SCSI_FEATURES (VHOST_FEATURES | (1ULL  
 VIRTIO_SCSI_F_HOTPLUG))
  
  struct vhost_scsi {
   /* Protected by vhost_scsi-dev.mutex */
 @@ -73,6 +76,12 @@ struct vhost_scsi {
  
   struct vhost_work vs_completion_work; /* cmd completion work item */
   struct llist_head vs_completion_list; /* cmd completion queue */
 +
 + struct vhost_work vs_event_work; /* evt injection work item */
 + struct llist_head vs_event_list; /* evt injection queue */
 +
 + bool vs_events_dropped; /* any missed events, protected by dev.mutex */
 + u64 vs_events_nr; /* num of pending events, protected by dev.mutex */

Woa u64. We allow up to 128 of these. int will do.

  };
  
  /* Local pointer to allocated TCM configfs fabric module */
 @@ -120,6 +129,16 @@ static bool tcm_vhost_check_endpoint(struct 
 vhost_virtqueue *vq)
   return ret;
  }
  
 +static bool tcm_vhost_check_events_dropped(struct vhost_scsi *vs)
 +{
 + bool ret;
 +
 + mutex_lock(vs-dev.mutex);
 + ret = vs-vs_events_dropped;
 + mutex_unlock(vs-dev.mutex);
 +
 + return ret;
 +}
  static int tcm_vhost_check_true(struct se_portal_group *se_tpg)
  {
   return 1;
 @@ -370,6 +389,36 @@ static int tcm_vhost_queue_tm_rsp(struct se_cmd *se_cmd)
   return 0;
  }
  
 +static void tcm_vhost_free_evt(struct vhost_scsi *vs, struct tcm_vhost_evt 
 *evt)
 +{
 + mutex_lock(vs-dev.mutex);
 + vs-vs_events_nr--;
 + kfree(evt);
 + mutex_unlock(vs-dev.mutex);
 +}
 +
 +static struct tcm_vhost_evt *tcm_vhost_allocate_evt(struct vhost_scsi *vs,
 + u32 event, u32 reason)
 +{
 + struct tcm_vhost_evt *evt;
 +
 + mutex_lock(vs-dev.mutex);
 + if (vs-vs_events_nr  VHOST_SCSI_MAX_EVENT) {


Se eventfd dropped flag here and stop worrying about it in callers?

 + mutex_unlock(vs-dev.mutex);
 + return NULL;
 + }
 +
 + evt = kzalloc(sizeof(*evt), GFP_KERNEL);
 + if (evt) {
 + evt-event.event = event;
 + evt-event.reason = reason;
 + vs-vs_events_nr++;
 + }
 + mutex_unlock(vs-dev.mutex);
 +
 + return evt;
 +}
 +
  static void vhost_scsi_free_cmd(struct tcm_vhost_cmd *tv_cmd)
  {
   struct se_cmd *se_cmd = tv_cmd-tvc_se_cmd;
 @@ -388,6 +437,77 @@ static void vhost_scsi_free_cmd(struct tcm_vhost_cmd 
 *tv_cmd)
   kfree(tv_cmd);
  }
  
 +static void tcm_vhost_do_evt_work(struct vhost_scsi *vs,
 + struct virtio_scsi_event *event)
 +{
 + struct vhost_virtqueue *vq = vs-vqs[VHOST_SCSI_VQ_EVT];
 + struct virtio_scsi_event __user *eventp;
 + unsigned out, in;
 + int head, ret;
 +
 + if (!tcm_vhost_check_endpoint(vq))
 + return;
 +
 + mutex_lock(vq-mutex);
 +again:
 + vhost_disable_notify(vs-dev, vq);
 + head = vhost_get_vq_desc(vs-dev, vq, vq-iov,
 + ARRAY_SIZE(vq-iov), out, in,
 + NULL, NULL);
 + if (head  0) {
 + mutex_lock(vs-dev.mutex);

This nests dev mutex within vq mutex, can cause deadlock.
Should be the other way around.

 + vs-vs_events_dropped = true;
 + mutex_unlock(vs-dev.mutex);
 + goto out;
 + }
 + if (head == vq-num) {
 + if (vhost_enable_notify(vs-dev, vq))
 + goto again;
 + mutex_lock(vs-dev.mutex);
 + vs-vs_events_dropped = true;
 + mutex_unlock(vs-dev.mutex);
 + goto out;
 + }
 +
 + if ((vq-iov[out].iov_len != sizeof(struct virtio_scsi_event))) {
 + vq_err(vq, Expecting virtio_scsi_event, got %zu bytes\n,
 + vq-iov[out].iov_len);
 + goto out;
 + }
 +
 + 

Re: [PATCH V4 0/2] tcm_vhost hotplug

2013-03-24 Thread Michael S. Tsirkin
On Fri, Mar 22, 2013 at 01:39:03PM +0800, Asias He wrote:
 Asias He (2):
   tcm_vhost: Introduce tcm_vhost_check_feature()
   tcm_vhost: Add hotplug/hotunplug support

So this work should stay out of tree until we have
2 users for vhost-scsi, but I sent some comment to
help you make progress.

  drivers/vhost/tcm_vhost.c | 224 
 --
  drivers/vhost/tcm_vhost.h |  10 +++
  2 files changed, 229 insertions(+), 5 deletions(-)
 
 -- 
 1.8.1.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2] target-i386: Improve x86_cpu_list output

2013-03-24 Thread Andreas Färber
Am 27.02.2013 10:37, schrieb Jan Kiszka:
 On 2013-02-27 10:33, Andreas Färber wrote:
 Am 27.02.2013 10:15, schrieb Jan Kiszka:
 Several issues fixed:
  - We were missing a bunch of feature lists. Fix this by simply dumping
the meta list feature_word_info.
  - kvm_enabled() cannot be true at this point because accelerators are
initialized much later during init. Also, hiding this makes it very
hard to discover for users. Simply dump unconditionally if CONFIG_KVM
is set.
  - Add explanation for host CPU type.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---

 Changes in v2:
  - Do not dump host type if CONFIG_KVM is not set
  - Explain that host depends on KVM mode

 I had requested on v1 to not fix multiple issues in one patch, but I can
 split it myself on Friday if there's no other issues.
 
 Sorry, missed that. But I also see no point in splitting up in this
 case, specifically as we no agree on the result.

Thanks, applied to qom-cpu as two patches (clean git-checkout -p split):
https://github.com/afaerber/qemu-cpu/commits/qom-cpu

Sorry for taking so long.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/5] KVM: nVMX: Fix conditions for NMI injection

2013-03-24 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

The logic for checking if interrupts can be injected has to be applied
also on NMIs. The difference is that if NMI interception is on these
events are consumed and blocked by the VM exit.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c |   35 +++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 30aa198..c01d487 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4190,6 +4190,12 @@ static bool nested_exit_on_intr(struct kvm_vcpu *vcpu)
PIN_BASED_EXT_INTR_MASK;
 }
 
+static bool nested_exit_on_nmi(struct kvm_vcpu *vcpu)
+{
+   return get_vmcs12(vcpu)-pin_based_vm_exec_control 
+   PIN_BASED_NMI_EXITING;
+}
+
 static void enable_irq_window(struct kvm_vcpu *vcpu)
 {
u32 cpu_based_vm_exec_control;
@@ -4315,6 +4321,35 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked)
 
 static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 {
+   if (is_guest_mode(vcpu)) {
+   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+   if (to_vmx(vcpu)-nested.nested_run_pending ||
+   vmcs_read32(GUEST_ACTIVITY_STATE) ==
+  GUEST_ACTIVITY_WAIT_SIPI)
+   return 0;
+   if (nested_exit_on_nmi(vcpu)) {
+   /*
+* Check if the idt_vectoring_info_field is free. We
+* cannot raise EXIT_REASON_EXCEPTION_NMI if it isn't.
+*/
+   if (vmcs12-idt_vectoring_info_field 
+   VECTORING_INFO_VALID_MASK)
+   return 0;
+   nested_vmx_vmexit(vcpu);
+   vmcs12-vm_exit_reason = EXIT_REASON_EXCEPTION_NMI;
+   vmcs12-vm_exit_intr_info = NMI_VECTOR |
+   INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK;
+   /*
+* The NMI-triggered VM exit counts as injection:
+* clear this one and block further NMIs.
+*/
+   vcpu-arch.nmi_pending = 0;
+   vmx_set_nmi_mask(vcpu, true);
+   return 0;
+   }
+   }
+
if (!cpu_has_virtual_nmis()  to_vmx(vcpu)-soft_vnmi_blocked)
return 0;
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/5] KVM: nVMX: Fix conditions for interrupt injection

2013-03-24 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

If we are in guest mode, L0 can only inject events into L2 if L1 has
nothing pending. Otherwise, L0 would overwrite L1's events and they
would get lost. But even if no injection of L1 is pending, we do not
want L0 to interrupt unnecessarily an on going vmentry with all its side
effects on the vmcs. Therefore, injection shall be disallowed during
L1-L2 transitions. This check is conceptually independent of
nested_exit_on_intr.

If L1 traps external interrupts, then we also need to look at L1's
idt_vectoring_info_field. If it is empty, we can kick the guest from L2
to L1, just like the previous code worked.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c |   28 
 1 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d1bc834..30aa198 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4325,16 +4325,28 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
-   if (is_guest_mode(vcpu)  nested_exit_on_intr(vcpu)) {
+   if (is_guest_mode(vcpu)) {
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-   if (to_vmx(vcpu)-nested.nested_run_pending ||
-   (vmcs12-idt_vectoring_info_field 
-VECTORING_INFO_VALID_MASK))
+
+   if (to_vmx(vcpu)-nested.nested_run_pending)
return 0;
-   nested_vmx_vmexit(vcpu);
-   vmcs12-vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
-   vmcs12-vm_exit_intr_info = 0;
-   /* fall through to normal code, but now in L1, not L2 */
+   if (nested_exit_on_intr(vcpu)) {
+   /*
+* Check if the idt_vectoring_info_field is free. We
+* cannot raise EXIT_REASON_EXTERNAL_INTERRUPT if it
+* isn't.
+*/
+   if (vmcs12-idt_vectoring_info_field 
+   VECTORING_INFO_VALID_MASK)
+   return 0;
+   nested_vmx_vmexit(vcpu);
+   vmcs12-vm_exit_reason =
+   EXIT_REASON_EXTERNAL_INTERRUPT;
+   vmcs12-vm_exit_intr_info = 0;
+   /*
+* fall through to normal code, but now in L1, not L2
+*/
+   }
}
 
return (vmcs_readl(GUEST_RFLAGS)  X86_EFLAGS_IF) 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/5] KVM: nVMX: Rework event injection and recovery

2013-03-24 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

The basic idea is to always transfer the pending event injection on
vmexit into the architectural state of the VCPU and then drop it from
there if it turns out that we left L2 to enter L1, i.e. if we enter
prepare_vmcs12.

vmcs12_save_pending_events takes care to transfer pending L0 events into
the queue of L1. That is mandatory as L1 may decide to switch the guest
state completely, invalidating or preserving the pending events for
later injection (including on a different node, once we support
migration).

This concept is based on the rule that a pending vmlaunch/vmresume is
not canceled. Otherwise, we would risk to lose injected events or leak
them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
entry of nested_vmx_vmexit.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c |   90 +--
 1 files changed, 58 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8827b3b..9d9ff74 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6493,8 +6493,6 @@ static void __vmx_complete_interrupts(struct kvm_vcpu 
*vcpu,
 
 static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
 {
-   if (is_guest_mode(vmx-vcpu))
-   return;
__vmx_complete_interrupts(vmx-vcpu, vmx-idt_vectoring_info,
  VM_EXIT_INSTRUCTION_LEN,
  IDT_VECTORING_ERROR_CODE);
@@ -6502,8 +6500,6 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
 
 static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
 {
-   if (is_guest_mode(vcpu))
-   return;
__vmx_complete_interrupts(vcpu,
  vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
  VM_ENTRY_INSTRUCTION_LEN,
@@ -6535,21 +6531,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned long debugctlmsr;
 
-   if (is_guest_mode(vcpu)  !vmx-nested.nested_run_pending) {
-   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-   if (vmcs12-idt_vectoring_info_field 
-   VECTORING_INFO_VALID_MASK) {
-   vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
-   vmcs12-idt_vectoring_info_field);
-   vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
-   vmcs12-vm_exit_instruction_len);
-   if (vmcs12-idt_vectoring_info_field 
-   VECTORING_INFO_DELIVER_CODE_MASK)
-   vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
-   vmcs12-idt_vectoring_error_code);
-   }
-   }
-
/* Record the guest's net vcpu time for enforced NMI injections. */
if (unlikely(!cpu_has_virtual_nmis()  vmx-soft_vnmi_blocked))
vmx-entry_time = ktime_get();
@@ -6708,17 +6689,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
vmx-idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
 
-   if (is_guest_mode(vcpu)) {
-   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-   vmcs12-idt_vectoring_info_field = vmx-idt_vectoring_info;
-   if (vmx-idt_vectoring_info  VECTORING_INFO_VALID_MASK) {
-   vmcs12-idt_vectoring_error_code =
-   vmcs_read32(IDT_VECTORING_ERROR_CODE);
-   vmcs12-vm_exit_instruction_len =
-   vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
-   }
-   }
-
vmx-loaded_vmcs-launched = 1;
 
vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
@@ -7325,6 +7295,48 @@ vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs12 
*vmcs12)
vcpu-arch.cr4_guest_owned_bits));
 }
 
+static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
+  struct vmcs12 *vmcs12)
+{
+   u32 idt_vectoring;
+   unsigned int nr;
+
+   if (vcpu-arch.exception.pending) {
+   nr = vcpu-arch.exception.nr;
+   idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
+
+   if (kvm_exception_is_soft(nr)) {
+   vmcs12-vm_exit_instruction_len =
+   vcpu-arch.event_exit_inst_len;
+   idt_vectoring |= INTR_TYPE_SOFT_EXCEPTION;
+   } else
+   idt_vectoring |= INTR_TYPE_HARD_EXCEPTION;
+
+   if (vcpu-arch.exception.has_error_code) {
+   idt_vectoring |= VECTORING_INFO_DELIVER_CODE_MASK;
+   vmcs12-idt_vectoring_error_code =
+   vcpu-arch.exception.error_code;
+   }
+
+   vmcs12-idt_vectoring_info_field = idt_vectoring;
+   } else if 

[PATCH v3 1/5] KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1

2013-03-24 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Check if the interrupt or NMI window exit is for L1 by testing if it has
the corresponding controls enabled. This is required when we allow
direct injection from L0 to L2

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Reviewed-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/vmx.c |9 ++---
 1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 03f5746..8827b3b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6112,14 +6112,9 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu 
*vcpu)
case EXIT_REASON_TRIPLE_FAULT:
return 1;
case EXIT_REASON_PENDING_INTERRUPT:
+   return nested_cpu_has(vmcs12, CPU_BASED_VIRTUAL_INTR_PENDING);
case EXIT_REASON_NMI_WINDOW:
-   /*
-* prepare_vmcs02() set the CPU_BASED_VIRTUAL_INTR_PENDING bit
-* (aka Interrupt Window Exiting) only when L1 turned it on,
-* so if we got a PENDING_INTERRUPT exit, this must be for L1.
-* Same for NMI Window Exiting.
-*/
-   return 1;
+   return nested_cpu_has(vmcs12, CPU_BASED_VIRTUAL_NMI_PENDING);
case EXIT_REASON_TASK_SWITCH:
return 1;
case EXIT_REASON_CPUID:
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/5] KVM: nVMX: Make direct IRQ/NMI injection work

2013-03-24 Thread Jan Kiszka
This version addresses the comment on patch 2, simplifying it
significantly by dropping everything that assumed an L2 vmentry on
vmlaunch/vmresume could be canceled by an emulated vmexit to L1.

Jan Kiszka (5):
  KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to
L1
  KVM: nVMX: Rework event injection and recovery
  KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask
  KVM: nVMX: Fix conditions for interrupt injection
  KVM: nVMX: Fix conditions for NMI injection

 arch/x86/kvm/vmx.c |  182 +++
 1 files changed, 125 insertions(+), 57 deletions(-)

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/5] KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask

2013-03-24 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

vmx_set_nmi_mask will soon be used by vmx_nmi_allowed. No functional
changes.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9d9ff74..d1bc834 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4284,16 +4284,6 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
 }
 
-static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
-{
-   if (!cpu_has_virtual_nmis()  to_vmx(vcpu)-soft_vnmi_blocked)
-   return 0;
-
-   return  !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) 
- (GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_STI
-  | GUEST_INTR_STATE_NMI));
-}
-
 static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
 {
if (!cpu_has_virtual_nmis())
@@ -4323,6 +4313,16 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked)
}
 }
 
+static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
+{
+   if (!cpu_has_virtual_nmis()  to_vmx(vcpu)-soft_vnmi_blocked)
+   return 0;
+
+   return  !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) 
+ (GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_STI
+  | GUEST_INTR_STATE_NMI));
+}
+
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
if (is_guest_mode(vcpu)  nested_exit_on_intr(vcpu)) {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Avoid busy loops over uninjectable pending APIC timers

2013-03-24 Thread Gleb Natapov
On Sun, Mar 24, 2013 at 10:45:53AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-03-22:
  On Fri, Mar 22, 2013 at 07:43:03AM -0300, Marcelo Tosatti wrote:
  On Fri, Mar 22, 2013 at 08:53:15AM +0200, Gleb Natapov wrote:
  On Thu, Mar 21, 2013 at 08:06:41PM -0300, Marcelo Tosatti wrote:
  On Thu, Mar 21, 2013 at 11:13:39PM +0200, Gleb Natapov wrote:
  On Thu, Mar 21, 2013 at 05:51:50PM -0300, Marcelo Tosatti wrote:
  But current PI patches do break them, thats my point. So we
  either need to revise them again, or drop LAPIC timer
  reinjection. Making apic_accept_irq semantics it returns
  coalescing info, but only sometimes is dubious though.
  We may rollback to the initial idea: test both irr and pir to get
  coalescing info. In this case, inject LAPIC timer always in vcpu context. So
  apic_accept_irq() will return right coalescing info.
  Also, we need to add comments to tell caller, apic_accept_irq()
  can ensure the return value is correct only when caller is in
  target vcpu context.
  
  We cannot touch irr while vcpu is in non-root operation, so we
  will have to pass flag to apic_accept_irq() to let it know that it
  is called synchronously. While all this is possible I want to know
  which guests exactly will we break if we will not track interrupt
  coalescing for lapic timer. If only 2.0 smp kernels will break we
  can probably drop it.
  
  RHEL4 / RHEL5 guests.
  RHEL5 has kvmclock no? We should not break RHEL4 though.
  
  kvmclock provides no timer interrupt... either LAPIC or PIT must be used
  with kvmclock.
  I am confused now. If LAPIC is not used for wallclock time keeping, but
  only for scheduling the reinjection is actually harmful. Reinjecting the
  interrupt will cause needles task rescheduling. So the question is if
  there is a Linux kernel that uses LAPIC for wallclock time keeping and
  relies on accurate number of injected interrupts to not time drift.
  
  See 4acd47cfea9c18134e0cbf915780892ef0ff433a on RHEL5, RHEL5 kernels
  before that commit did not reinject.  Which means that all non-RHEL
  Linux guests based on that upstream code also suffer from the same
  problem.
  
  The commit actually fixes guest, not host. The existence of the commit
  also means that LAPIC timer reinjection does not solve the problem and
  all guests without this commit will suffer from the bug regardless of
  what we will decide to do here. Without LAPIC timer reinfection the
  effect of the bug will be much more visible and long lasting though.
  
  Also any other algorithm which uses LAPIC timers and compare that with
  other clocks (such as NMI watchdog) are potentially vulnerable.
  They are with or without timer reinjection as commit you pointed to
  shows.
  
  
  Can drop it, and then wait until someone complains (if so).
  
  Yes, tough decision to make. All the complains will be guest bugs which
  can be hit without reinjection too, but with less probability. Why we so
  keen on keeping RTC reinject is that the guests that depends on it
  cannot be fixed.
  
  Knowing that Linux tend to disable interrupt it is likely that it tries
  to detect and compensate for missing interrupt.
  
  As said above, any algorithm which compares LAPIC timer interrupt with
  another clock is vulnerable.
 Any conclusion? 
 
Lets not check for coalescing in PI patches for now.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio powerpc: implement IOMMU driver for VFIO

2013-03-24 Thread David Gibson
On Wed, Mar 20, 2013 at 09:16:24PM -0600, Alex Williamson wrote:
 On Thu, 2013-03-21 at 12:55 +1100, David Gibson wrote:
  On Wed, Mar 20, 2013 at 02:45:24PM -0600, Alex Williamson wrote:
   On Tue, 2013-03-19 at 18:08 +1100, Alexey Kardashevskiy wrote:
VFIO implements platform independent stuff such as
a PCI driver, BAR access (via read/write on a file descriptor
or direct mapping when possible) and IRQ signaling.

The platform dependent part includes IOMMU initialization
and handling. This patch implements an IOMMU driver for VFIO
which does mapping/unmapping pages for the guest IO and
provides information about DMA window (required by a POWERPC
guest).

The counterpart in QEMU is required to support this functionality.

Changelog:
* documentation updated
* containter enable/disable ioctls added
* request_module(spapr_iommu) added
* various locks fixed

Cc: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
   
   
   Looking pretty good.  There's one problem with the detach_group,
   otherwise just some trivial comments below.  What's the status of the
   tce code that this depends on?  Thanks,
  
  [snip]
+static void tce_iommu_detach_group(void *iommu_data,
+   struct iommu_group *iommu_group)
+{
+   struct tce_container *container = iommu_data;
+   struct iommu_table *tbl = 
iommu_group_get_iommudata(iommu_group);
+
+   BUG_ON(!tbl);
+   mutex_lock(container-lock);
+   if (tbl != container-tbl) {
+   pr_warn(tce_vfio: detaching group #%u, expected group 
is #%u\n,
+   iommu_group_id(iommu_group),
+   iommu_group_id(tbl-it_group));
+   } else if (container-enabled) {
+   pr_err(tce_vfio: detaching group #%u from enabled 
container\n,
+   iommu_group_id(tbl-it_group));
   
   Hmm, something more than a pr_err needs to happen here.  Wouldn't this
   imply a disable and going back to an unprivileged container?
  
  Uh, no.  I think the idea here is that we use the enable/disable
  semantic to address some other potential problems.  Specifically,
  sidestepping the problems of what happens if you change the
  container's capabilities by adding/removing groups while in the middle
  of using it.  So the point is that the detach fails when the group is
  enabled, rather than implicitly doing anything.
 
 The function returns void.

Uh, yeah, we should probably fix that.

  We're not failing the detach, just getting
 into a broken state.  This is only called to unwind attaching groups
 when the iommu is set or if the user explicitly calls
 GROUP_UNSET_CONTAINER.  The former won't have had a chance to call
 enable but the latter would need to be fixed.

Well, in the latter case it's essentialy a silent failure, which isn't
great.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: Digital signature


Re: [PULL 0/7] ppc patch queue 2013-03-22

2013-03-24 Thread Gleb Natapov
On Fri, Mar 22, 2013 at 03:25:56PM +0100, Alexander Graf wrote:
 Hi Marcelo / Gleb,
 
 This is my current patch queue for ppc.  Please pull.
 
 Changes include:
 
   - race-free in-kernel watchdog handling api
   - e500 mmu fixes
 
 Alex
 
Pulled, thanks.

 
 The following changes since commit 2ae33b389601b86a3d0cfe2d09f5e3189d5322fd:
   Marcelo Tosatti (1):
 Merge remote-tracking branch 'upstream/master' into queue
 
 are available in the git repository at:
 
   git://github.com/agraf/linux-2.6.git kvm-ppc-next
 
 Bharat Bhushan (3):
   KVM: PPC: move tsr update in a separate function
   KVM: PPC: Added one_reg interface for timer registers
   KVM: PPC: booke: Added debug handler
 
 Paul Mackerras (1):
   KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external
 
 Scott Wood (3):
   kvm/ppc/e500: h2g_tlb1_rmap: esel 0 is valid
   kvm/ppc/e500: g2h_tlb1_map: clear old bit before setting new bit
   kvm/ppc/e500: eliminate tlb_refs
 
  Documentation/virtual/kvm/api.txt   |4 ++
  arch/powerpc/include/asm/kvm_host.h |1 +
  arch/powerpc/include/asm/kvm_ppc.h  |3 +-
  arch/powerpc/include/uapi/asm/kvm.h |5 ++
  arch/powerpc/kernel/asm-offsets.c   |1 +
  arch/powerpc/kvm/book3s.c   |3 +-
  arch/powerpc/kvm/booke.c|   57 +++-
  arch/powerpc/kvm/booke_interrupts.S |   42 -
  arch/powerpc/kvm/e500.h |   24 +++---
  arch/powerpc/kvm/e500_mmu_host.c|   84 +++---
  arch/powerpc/kvm/powerpc.c  |2 +-
  11 files changed, 133 insertions(+), 93 deletions(-)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html