Re: [PATCH] KVM: nSVM/nVMX: Implement vmexit on INIT assertion

2013-02-25 Thread Nadav Har'El
On Sun, Feb 24, 2013, Jan Kiszka wrote about [PATCH] KVM: nSVM/nVMX: Implement 
vmexit on INIT assertion:
 From: Jan Kiszka jan.kis...@siemens.com
 
 On Intel, raising INIT causing an unconditional vmexit. On AMD, this is
 controlled by the interception mask.

Hi,

I never tried to closely follow the KVM code paths related this code,
but I do have one question: The VMX spec says:

   The INIT signal is blocked whenever a logical processor is in VMX root
operation. It is not blocked in VMX non-root operation. Instead, INITs
cause VM exits (see Section 22.3, Other Causes of VM Exits).

So when running a non-nested L1 guest, or an L2 guest, the new behavior
appears correct. However, it looks like if L1 is running in root
mode (i.e., did VMXON once but not running L2 now), the INIT signal
needs to be blocked, not do what it does now. It appears (but I'm not sure...)
that right now, it causes the L1 guest to lock up, and is not ignored? 

Nadav.

-- 
Nadav Har'El|Monday, Feb 25 2013, 15 Adar 5773
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |If con is the opposite of pro, is
http://nadav.harel.org.il   |congress the opposite of progress?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 54061] guest panic after live migration

2013-02-25 Thread Xiao Guangrong
On 02/20/2013 04:06 PM, bugzilla-dae...@bugzilla.kernel.org wrote:
 https://bugzilla.kernel.org/show_bug.cgi?id=54061
 
 
 
 
 
 --- Comment #2 from Jay Ren yongjie@intel.com  2013-02-20 08:06:16 ---
 Marcleo,
 
 yes, after reverting that commit caf6900f2d8, live-migration can work fine.
 

Sorry for the wrong patch and the bug is: the fast page fault path
can make large-spte writeable without probably set dirty bitmap for
all small pages.

Two ways to fix these:
1): covert the large-spte to small sptes and wirte-protect them, then
no readonly large-spte exists.

2): only allow fast page fault to fix #PF on small spte.

Seems the first way is better because the second way can cause useless
page table waking.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 54061] guest panic after live migration

2013-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=54061





--- Comment #3 from Anonymous Emailer anonym...@kernel-bugs.osdl.org  
2013-02-25 08:26:26 ---
Reply-To: xiaoguangr...@linux.vnet.ibm.com

On 02/20/2013 04:06 PM, bugzilla-dae...@bugzilla.kernel.org wrote:
 https://bugzilla.kernel.org/show_bug.cgi?id=54061
 
 
 
 
 
 --- Comment #2 from Jay Ren yongjie@intel.com  2013-02-20 08:06:16 ---
 Marcleo,
 
 yes, after reverting that commit caf6900f2d8, live-migration can work fine.
 

Sorry for the wrong patch and the bug is: the fast page fault path
can make large-spte writeable without probably set dirty bitmap for
all small pages.

Two ways to fix these:
1): covert the large-spte to small sptes and wirte-protect them, then
no readonly large-spte exists.

2): only allow fast page fault to fix #PF on small spte.

Seems the first way is better because the second way can cause useless
page table waking.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 0/4] kvm: Make ioeventfd usable on s390.

2013-02-25 Thread Cornelia Huck
On Sun, 24 Feb 2013 11:56:39 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Fri, Feb 22, 2013 at 01:09:45PM +0100, Cornelia Huck wrote:
  Here's the second attempt at implementing ioeventfd for s390.
 
 The patchset looks fine overall.
 Minor comments and questions below.

Cool, thanks for reviewing.

 
  
  Rather than the architecture-specific functions used in v1, we
  now try to integrate with the kvm_io_device infrastructure.
  Calls to diagnose 500 subcode 3 are now mapped to _write.
  These devices are created on a new KVM_CSS_BUS when using a
  new flag KVM_IOEVENTFD_FLAG_CSS. addr and datamatch are (ab)used
  to contain the subchannel id and the virtqueue.
  
  A drawback is that this interface is not easily extendable should
  we want to attach other hypercalls or carry more payload.
 
 Under s390 kvm_hypercallX already uses diagnose 500 so that seems
 fine. If you want to make it more generic and support
 more subcodes, I think you'll have to pass an extra u64 field:
 to bus to both avoid overflowing int value and avoid ugly
 bus-specific hacks in generic code.
 
 Will we ever need that? Code using subcode 3 does not yet seem
 to be upstream in 3.8 so maybe yes, but you decide.

Subcode 3 will be in 3.9.

 An alternative is to add new bus types when kvm needs to handle
 new subcodes. So e.g. KVM_BUS_S390_VIRTIO_CCW_NOTIFY and
 KVM_BUS_S390_VIRTIO_NOTIFY ?

I think I'll fall back to a new bus type should we ever need a new
notification type - the less strange hacks, the better.

 
 You decide, I'm fine with either approach.
 
 More minor comments and questions in response to individual patches.
 
  Another limitation is the limit of 1000 io devices per bus, which
  we would hit easily with a few hundred devices, but that should
  be fixable.
  
  v1 - v2:
  - Move irqfd initialization from a module init function to kvm_init,
eliminating the need for a second module for kvm/s390.
  - Use kvm_io_device for s390 css devices.
  
  
  Cornelia Huck (4):
KVM: Initialize irqfd from kvm_init().
KVM: Introduce KVM_CSS_BUS.
KVM: ioeventfd for s390 css devices.
KVM: s390: Wire up ioeventfd.
  
   Documentation/virtual/kvm/api.txt |  7 +++
   arch/s390/kvm/Kconfig |  1 +
   arch/s390/kvm/Makefile|  2 +-
   arch/s390/kvm/diag.c  | 25 +
   arch/s390/kvm/kvm-s390.c  |  1 +
   include/linux/kvm_host.h  | 14 ++
   include/uapi/linux/kvm.h  |  2 ++
   virt/kvm/eventfd.c| 15 ---
   virt/kvm/kvm_main.c   |  6 ++
   9 files changed, 65 insertions(+), 8 deletions(-)
  
  -- 
  1.7.12.4
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Zhang, Yang Z
Avi Kivity wrote on 2013-02-25:
 I didn't really follow, but is the root cause the need to keep track
 of interrupt coalescing?  If so we can recommend that users use
 KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection
 with irq coalescing support to vcpu context.
So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt 
is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this 
acceptable?

The only case in KVM that need to know the interrupt injection status is vlapic 
timer. But since vlapic timer and vcpu are always in same pcpu, so there is no 
problem.

 It's not pleasant to cause a performance regression, so it should be a
 last resort (or maybe do both if it helps).
 
 On Sun, Feb 24, 2013 at 8:08 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Sun, Feb 24, 2013 at 04:19:17PM +0200, Gleb Natapov wrote:
 On Sun, Feb 24, 2013 at 01:55:07PM +, Zhang, Yang Z wrote:
 I do not think it fixes it. There is no guaranty that IPI will be
 processed by remote cpu while sending cpu is still in locked section, so
 the same race may happen regardless. As you say above there are 3
 contexts, but only two use locks.
 See following logic, I think the problem you mentioned will not
 happened with current logic.
 
 get lock
 if test_pir (this will ensure there is no in flight IPI for same 
 interrupt. Since
 we are taking the lock now, no IPI will be sent before we release the lock 
 and no
 pir-irr is performed by hardware for same interrupt.)
 
 Does the PIR IPI interrupt returns synchronously _after_ PIR-IRR transfer
 is made? Don't think so.
 
 PIR IPI interrupt returns after remote CPU has acked interrupt receival,
 not after remote CPU has acked _and_ performed PIR-IRR transfer.
 
 Right? (yes, right, see step 4. below).
 
 Should try to make it easier to drop the lock later, so depend on it as
 little as possible (also please document what it protects in detail).
 
 Also note:
 
 
 3. The processor clears the outstanding-notification bit in the
 posted-interrupt descriptor. This is done atomically
 so as to leave the remainder of the descriptor unmodified (e.g., with a
 locked AND operation).
 4. The processor writes zero to the EOI register in the local APIC; this
 dismisses the interrupt with the postedinterrupt
 notification vector from the local APIC.
 5. The logical processor performs a logical-OR of PIR into VIRR and
 clears PIR. No other agent can read or write a
 PIR bit (or group of bits) between the time it is read (to determine
 what to OR into VIRR) and when it is cleared.
 
 
 So checking the ON bit does not mean the HW has finished performing
 PIR-VIRR transfer (which means ON bit can only be used as an indication
 of whether to send interrupt, not whether PIR-VIRR transfer is
 finished).
 
 So its fine to do
 
 - atomic set pir
 - if (atomic_test_and_set(PIR ON bit))
 -  send IPI
 
 But HW can transfer two distinct bits, set by different serialized instances
 of kvm_set_irq (protected with a lock), in a single PIR-IRR pass.
 
 I do not see where those assumptions are coming from. Testing pir does
 not guaranty that the IPI is not processed by VCPU right now.
 
 iothread:   vcpu:
 send_irq()
 lock(pir)
 check pir and irr
 set pir
 send IPI (*)
 unlock(pir)
 
 send_irq()
 lock(pir)
  receive IPI (*)
  atomic {
pir_tmp = pir
pir = 0
  } check pir and irr irr = pir_tmp
 set pir
 send IPI
 unlock(pir)
 
 At this point both pir and irr are set and interrupt may be coalesced,
 but it is reported as delivered.
 
 s/set pir/injected = !t_a_s(pir)/
 
 So what prevents the scenario above from happening?
 
 --
   Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 4/4] KVM: s390: Wire up ioeventfd.

2013-02-25 Thread Cornelia Huck
On Sun, 24 Feb 2013 11:45:22 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Fri, Feb 22, 2013 at 01:09:49PM +0100, Cornelia Huck wrote:
  Enable ioeventfd support on s390 and hook up diagnose 500 virtio-ccw
  notifications.
  
  Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
  ---
   arch/s390/kvm/Kconfig|  1 +
   arch/s390/kvm/Makefile   |  2 +-
   arch/s390/kvm/diag.c | 25 +
   arch/s390/kvm/kvm-s390.c |  1 +
   4 files changed, 28 insertions(+), 1 deletion(-)
  
  diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
  index b58dd86..3c43e30 100644
  --- a/arch/s390/kvm/Kconfig
  +++ b/arch/s390/kvm/Kconfig
  @@ -22,6 +22,7 @@ config KVM
  select PREEMPT_NOTIFIERS
  select ANON_INODES
  select HAVE_KVM_CPU_RELAX_INTERCEPT
  +   select HAVE_KVM_EVENTFD
  ---help---
Support hosting paravirtualized guest machines using the SIE
virtualization capability on the mainframe. This should work
  diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
  index 3975722..8fe9d65 100644
  --- a/arch/s390/kvm/Makefile
  +++ b/arch/s390/kvm/Makefile
  @@ -6,7 +6,7 @@
   # it under the terms of the GNU General Public License (version 2 only)
   # as published by the Free Software Foundation.
   
  -common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o)
  +common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o eventfd.o)
   
   ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
   
  diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
  index a390687..13cf74a 100644
  --- a/arch/s390/kvm/diag.c
  +++ b/arch/s390/kvm/diag.c
  @@ -104,6 +104,29 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
  return -EREMOTE;
   }
   
  +static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
  +{
  +   int ret, idx;
  +
  +   /* No virtio-ccw notification? Get out quickly. */
  +   if (!vcpu-kvm-arch.css_support ||
  +   (vcpu-run-s.regs.gprs[1] != 3))
  +   return -EOPNOTSUPP;
 
 Looks like we have:
 arch/s390/include/uapi/asm/kvm_virtio.h:#define KVM_S390_VIRTIO_NOTIFY 0
 arch/s390/include/uapi/asm/kvm_virtio.h:#define KVM_S390_VIRTIO_RESET 1
 arch/s390/include/uapi/asm/kvm_virtio.h:#define KVM_S390_VIRTIO_SET_STATUS
   2
 
 KVM_S390_VIRTIO_CSS_NOTIFY is missing? Strange.
 Maybe it should be added in that header and included here?

kvm_virtio.h is for the old-style s390-virtio mechanism. Subcode 3 will
be included in 3.9.

 Do some guests use it?
 Also, do you want to support KVM_S390_VIRTIO_NOTIFY as well?

We don't want to invest any more effort in the old style s390-virtio
transport (other than keeping it working). Any future work will be for
virtio-ccw.

 
 
  +
  +   idx = srcu_read_lock(vcpu-kvm-srcu);
  +   /*
  +* The layout is as follows:
  +* - gpr 2 contains the subchannel id (passed as addr)
  +* - gpr 3 contains the virtqueue index (passed as datamatch)
  +*/
  +   ret = kvm_io_bus_write(vcpu-kvm, KVM_CSS_BUS,
  +   vcpu-run-s.regs.gprs[2],
  +   8, vcpu-run-s.regs.gprs[3]);
  +   srcu_read_unlock(vcpu-kvm-srcu, idx);
  +   /* kvm_io_bus_write returns -EOPNOTSUPP if it found no match. */
  +   return ret;
  +}
  +
   int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
   {
  int code = (vcpu-arch.sie_block-ipb  0xfff)  16;
  @@ -118,6 +141,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
  return __diag_time_slice_end_directed(vcpu);
  case 0x308:
  return __diag_ipl_functions(vcpu);
  +   case 0x500:
  +   return __diag_virtio_hypercall(vcpu);
  default:
  return -EOPNOTSUPP;
  }
  diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
  index f822d36..04d2454 100644
  --- a/arch/s390/kvm/kvm-s390.c
  +++ b/arch/s390/kvm/kvm-s390.c
  @@ -142,6 +142,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  case KVM_CAP_ONE_REG:
  case KVM_CAP_ENABLE_CAP:
  case KVM_CAP_S390_CSS_SUPPORT:
  +   case KVM_CAP_IOEVENTFD:
  r = 1;
  break;
  case KVM_CAP_NR_VCPUS:
  -- 
  1.7.12.4
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 3/4] KVM: ioeventfd for s390 css devices.

2013-02-25 Thread Cornelia Huck
On Sun, 24 Feb 2013 11:47:50 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Fri, Feb 22, 2013 at 01:09:48PM +0100, Cornelia Huck wrote:
  Enhance KVM_IOEVENTFD with a new flag that allows to attach to s390 css
  devices.
  
  Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
  ---
   Documentation/virtual/kvm/api.txt | 7 +++
   include/uapi/linux/kvm.h  | 2 ++
   virt/kvm/eventfd.c| 8 ++--
   3 files changed, 15 insertions(+), 2 deletions(-)
  
  diff --git a/Documentation/virtual/kvm/api.txt 
  b/Documentation/virtual/kvm/api.txt
  index c2534c3..40e799c 100644
  --- a/Documentation/virtual/kvm/api.txt
  +++ b/Documentation/virtual/kvm/api.txt
  @@ -1468,15 +1468,22 @@ struct kvm_ioeventfd {
  __u8  pad[36];
   };
   
  +For the special case of s390 css devices, the ioevent is matched to a
  +subchannel/virtqueue tuple instead.
  +
   The following flags are defined:
   
   #define KVM_IOEVENTFD_FLAG_DATAMATCH (1  kvm_ioeventfd_flag_nr_datamatch)
   #define KVM_IOEVENTFD_FLAG_PIO   (1  kvm_ioeventfd_flag_nr_pio)
   #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1  kvm_ioeventfd_flag_nr_deassign)
  +#define KVM_IOEVENTFD_FLAG_CSS   (1  kvm_ioeventfd_flag_nr_css)
   
   If datamatch flag is set, the event will be signaled only if the written 
  value
   to the registered address is equal to datamatch in struct kvm_ioeventfd.
   
  +For css devices, addr contains the subchannel id and datamatch the 
  virtqueue
  +index.
  +
   
   4.60 KVM_DIRTY_TLB
   
  diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
  index 9a2db57..1df0766 100644
  --- a/include/uapi/linux/kvm.h
  +++ b/include/uapi/linux/kvm.h
  @@ -448,12 +448,14 @@ enum {
  kvm_ioeventfd_flag_nr_datamatch,
  kvm_ioeventfd_flag_nr_pio,
  kvm_ioeventfd_flag_nr_deassign,
  +   kvm_ioeventfd_flag_nr_css,
  kvm_ioeventfd_flag_nr_max,
   };
   
   #define KVM_IOEVENTFD_FLAG_DATAMATCH (1  kvm_ioeventfd_flag_nr_datamatch)
   #define KVM_IOEVENTFD_FLAG_PIO   (1  kvm_ioeventfd_flag_nr_pio)
   #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1  kvm_ioeventfd_flag_nr_deassign)
  +#define KVM_IOEVENTFD_FLAG_CSS   (1  kvm_ioeventfd_flag_nr_css)
 
 So let's explicitly name is KVM_IOEVENTFD_S390_VIRTIO_CCW or even
 KVM_IOEVENTFD_S390_VIRTIO_CCW_NOTIFY?

How about KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY?

 Also, maybe we want to add KVM_IOEVENTFD_S390_VIRTIO_NOTIFY as well?

No, since s390-virtio is a dead end.

 
   
   #define KVM_IOEVENTFD_VALID_FLAG_MASK  ((1  kvm_ioeventfd_flag_nr_max) - 
  1)
   
  diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
  index f0ced1a..7347652 100644
  --- a/virt/kvm/eventfd.c
  +++ b/virt/kvm/eventfd.c
  @@ -679,11 +679,13 @@ static int
   kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
   {
  int   pio = args-flags  KVM_IOEVENTFD_FLAG_PIO;
  -   enum kvm_bus  bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS;
  +   int   css = args-flags  KVM_IOEVENTFD_FLAG_CSS;
  +   enum kvm_bus  bus_idx;
  struct _ioeventfd*p;
  struct eventfd_ctx   *eventfd;
  int   ret;
   
  +   bus_idx = pio ? KVM_PIO_BUS : css ? KVM_CSS_BUS: KVM_MMIO_BUS;
  /* must be natural-word sized */
  switch (args-len) {
  case 1:
  @@ -759,11 +761,13 @@ static int
   kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
   {
  int   pio = args-flags  KVM_IOEVENTFD_FLAG_PIO;
  -   enum kvm_bus  bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS;
  +   int   css = args-flags  KVM_IOEVENTFD_FLAG_CSS;
  +   enum kvm_bus  bus_idx;
  struct _ioeventfd*p, *tmp;
  struct eventfd_ctx   *eventfd;
  int   ret = -ENOENT;
   
  +   bus_idx = pio ? KVM_PIO_BUS : css ? KVM_CSS_BUS: KVM_MMIO_BUS;
  eventfd = eventfd_ctx_fdget(args-fd);
  if (IS_ERR(eventfd))
  return PTR_ERR(eventfd);
  -- 
  1.7.12.4
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 2/4] KVM: Introduce KVM_CSS_BUS.

2013-02-25 Thread Cornelia Huck
On Sun, 24 Feb 2013 11:57:25 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Fri, Feb 22, 2013 at 01:09:47PM +0100, Cornelia Huck wrote:
  Add a new bus type for s390 css kvm io devices.
  
  Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
  ---
   include/linux/kvm_host.h | 1 +
   1 file changed, 1 insertion(+)
  
  diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
  index 3b768ef..59be516 100644
  --- a/include/linux/kvm_host.h
  +++ b/include/linux/kvm_host.h
  @@ -148,6 +148,7 @@ struct kvm_io_bus {
   enum kvm_bus {
  KVM_MMIO_BUS,
  KVM_PIO_BUS,
  +   KVM_CSS_BUS,
 
 so maybe KVM_S390_VIRTIO_CCW_NOTIFY_BUS
 
 to make the fact it's s390 and virtio specific, explicit?

Hm, I don't really see a need to point out that this is s390-specific,
so I think I'd prefer KVM_VIRTIO_CCW_NOTIFY_BUS.

 
  KVM_NR_BUSES
   };
   
  -- 
  1.7.12.4
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nSVM/nVMX: Implement vmexit on INIT assertion

2013-02-25 Thread Jan Kiszka
On 2013-02-25 09:00, Nadav Har'El wrote:
 On Sun, Feb 24, 2013, Jan Kiszka wrote about [PATCH] KVM: nSVM/nVMX: 
 Implement vmexit on INIT assertion:
 From: Jan Kiszka jan.kis...@siemens.com

 On Intel, raising INIT causing an unconditional vmexit. On AMD, this is
 controlled by the interception mask.
 
 Hi,
 
 I never tried to closely follow the KVM code paths related this code,
 but I do have one question: The VMX spec says:
 
The INIT signal is blocked whenever a logical processor is in VMX root
 operation. It is not blocked in VMX non-root operation. Instead, INITs
 cause VM exits (see Section 22.3, Other Causes of VM Exits).
 
 So when running a non-nested L1 guest, or an L2 guest, the new behavior
 appears correct. However, it looks like if L1 is running in root
 mode (i.e., did VMXON once but not running L2 now), the INIT signal
 needs to be blocked, not do what it does now. It appears (but I'm not sure...)
 that right now, it causes the L1 guest to lock up, and is not ignored? 

Yeah, forgot about this detail. Unfortunately, this means we need to
track the signal state, processing it on vmentry and, on AMD, when GIF
becomes 1.

Is the nested-related state already saved on AMD, Jörg? If not, adding
this one would not make things worse at least. Still, missing user space
save/restore already breaks reset, not only migration (dunno if this is
better on AMD).

Jan



signature.asc
Description: OpenPGP digital signature


Re: Tracing kvm: kvm_entry and kvm_exit

2013-02-25 Thread Stefan Hajnoczi
On Fri, Feb 22, 2013 at 11:34:27AM -0500, Mohamad Gebai wrote:
 I am tracing kvm using perf and I am analyzing the sequences of kvm_entry and
 kvm_exit tracepoints.
 I noticed that during the boot process of a VM, there are a lot more (2 to 3 
 as
 many times) kvm_entry event than there are kvm_exit. I tried looking around 
 but
 didn't find anything that explains this. Is this missing instrumentation? Or
 what other path does kvm take that doesn't generate a kvm_exit event?

Gleb Natapov noticed something similar when playing with the perf script
I posted here:

http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/104181

Perhaps there is a code path that is missing trace_kvm_exit().

We didn't investigate why it happens but the unexplained kvm_entry
events only appeared at the beginning of the trace, so the theory was
that events are not activated atomically by perf(1).

CCing perf mailing list.

It would be interesting if someone knows the answer.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Gleb Natapov
On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote:
 Avi Kivity wrote on 2013-02-25:
  I didn't really follow, but is the root cause the need to keep track
  of interrupt coalescing?  If so we can recommend that users use
  KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection
  with irq coalescing support to vcpu context.
 So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt 
 is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this 
 acceptable?
 
 The only case in KVM that need to know the interrupt injection status is 
 vlapic timer. But since vlapic timer and vcpu are always in same pcpu, so 
 there is no problem.
 
Not really. The primary user of this interface is RTC interrupt
re-injection for Windows guests.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-02-25:
 On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote:
 Avi Kivity wrote on 2013-02-25:
 I didn't really follow, but is the root cause the need to keep track
 of interrupt coalescing?  If so we can recommend that users use
 KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection
 with irq coalescing support to vcpu context.
 So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted
 interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does
 this acceptable?
 
 The only case in KVM that need to know the interrupt injection status is 
 vlapic
 timer. But since vlapic timer and vcpu are always in same pcpu, so there is no
 problem.
 
 Not really. The primary user of this interface is RTC interrupt
 re-injection for Windows guests.
So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well?

Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Gleb Natapov
On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-02-25:
  On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote:
  Avi Kivity wrote on 2013-02-25:
  I didn't really follow, but is the root cause the need to keep track
  of interrupt coalescing?  If so we can recommend that users use
  KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection
  with irq coalescing support to vcpu context.
  So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted
  interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does
  this acceptable?
  
  The only case in KVM that need to know the interrupt injection status is 
  vlapic
  timer. But since vlapic timer and vcpu are always in same pcpu, so there is 
  no
  problem.
  
  Not really. The primary user of this interface is RTC interrupt
  re-injection for Windows guests.
 So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well?
 
Windows guests may experience timedrift under CPU overcommit scenario.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-02-25:
 On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-02-25:
 On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote:
 Avi Kivity wrote on 2013-02-25:
 I didn't really follow, but is the root cause the need to keep track
 of interrupt coalescing?  If so we can recommend that users use
 KVM_IRQ_LINE when coalescing is unneeded, and move interrupt
 injection with irq coalescing support to vcpu context.
 So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted
 interrupt is enabled to force users doesn't to use
 KVM_IRQ_LINE_STATUS. Does this acceptable?
 
 The only case in KVM that need to know the interrupt injection status is
 vlapic
 timer. But since vlapic timer and vcpu are always in same pcpu, so there is 
 no
 problem.
 
 Not really. The primary user of this interface is RTC interrupt
 re-injection for Windows guests.
 So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well?
 
 Windows guests may experience timedrift under CPU overcommit scenario.
Ok, I see. Seems we are stuck. :(
Do you have any suggestion to solve or workaround current problem?

Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM updates for the 3.9 merge window

2013-02-25 Thread Martin Schwidefsky
On Sun, 24 Feb 2013 16:05:31 -0800
Linus Torvalds torva...@linux-foundation.org wrote:

 On Wed, Feb 20, 2013 at 5:17 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 
  Please pull from
 
  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.9-1
 
  to receive the KVM updates for the 3.9 merge window [..]
 
 Ok, particularly the s390 people should check me resolution of the
 conflicts, since they include the renaming of IOINT_VIR to IRQIO_VIR.
 But the uapi header file move should be couble-checked by people who
 use this too.

The IRQIO_VIR is now at the end of the IRQIO_xxx entries while we had
it between the IRQIO_CSC and the IRQIO_PCI entry. No worry, we just
change our code and use the upstream version. Just completed the
double-check and pushed my branches to the linux-s390 tree. The code
is in sync.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for 2013-02-26

2013-02-25 Thread Juan Quintela

Hi

Please send in any agenda topics you are interested in.

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fixing KVM/ARM breakage in mainline

2013-02-25 Thread Marc Zyngier
Russell, Marcelo, Gleb,

Commit 89f8833 (Merge tag 'kvm-3.9-1' of
git://git.kernel.org/pub/scm/virt/kvm/kvm) broke KVM/ARM. Nothing
fundamental, just annoying enough.

I've posted patches against next-20130215 some time ago:
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/105019

I've now created a branch based off mainline as of this morning, and
containing these fixes:
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
kvm-arm/core-3.9-rc1-fixes

Now the question is: who wants to merge this? The diffstat looks like this:

Marc Zyngier (3):
  ARM: KVM: fix kvm_arch_{prepare,commit}_memory_region
  ARM: KVM: Rename KVM_MEMORY_SLOTS - KVM_USER_MEM_SLOTS
  ARM: KVM: fix compilation after removal of user_alloc from struct
kvm_memory_slot

 arch/arm/include/asm/kvm_host.h | 2 +-
 arch/arm/kvm/arm.c  | 4 ++--
 arch/arm/kvm/mmu.c  | 5 -
 3 files changed, 3 insertions(+), 8 deletions(-)

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing KVM/ARM breakage in mainline

2013-02-25 Thread Gleb Natapov
On Mon, Feb 25, 2013 at 12:18:27PM +, Marc Zyngier wrote:
 Russell, Marcelo, Gleb,
 
 Commit 89f8833 (Merge tag 'kvm-3.9-1' of
 git://git.kernel.org/pub/scm/virt/kvm/kvm) broke KVM/ARM. Nothing
 fundamental, just annoying enough.
 
 I've posted patches against next-20130215 some time ago:
 http://comments.gmane.org/gmane.comp.emulators.kvm.devel/105019
 
 I've now created a branch based off mainline as of this morning, and
 containing these fixes:
 git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
 kvm-arm/core-3.9-rc1-fixes
 
 Now the question is: who wants to merge this? The diffstat looks like this:
 
I was going to merge it today.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Marcelo Tosatti
On Mon, Feb 25, 2013 at 11:13:25AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-02-25:
  On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-02-25:
  On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote:
  Avi Kivity wrote on 2013-02-25:
  I didn't really follow, but is the root cause the need to keep track
  of interrupt coalescing?  If so we can recommend that users use
  KVM_IRQ_LINE when coalescing is unneeded, and move interrupt
  injection with irq coalescing support to vcpu context.
  So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted
  interrupt is enabled to force users doesn't to use
  KVM_IRQ_LINE_STATUS. Does this acceptable?
  
  The only case in KVM that need to know the interrupt injection status is
  vlapic
  timer. But since vlapic timer and vcpu are always in same pcpu, so there 
  is no
  problem.
  
  Not really. The primary user of this interface is RTC interrupt
  re-injection for Windows guests.
  So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well?
  
  Windows guests may experience timedrift under CPU overcommit scenario.
 Ok, I see. Seems we are stuck. :(
 Do you have any suggestion to solve or workaround current problem?

Depend on knowledge about atomicity (item 5 IIRC) of the sequence 
in the manual.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Zhang, Yang Z
Marcelo Tosatti wrote on 2013-02-25:
 On Mon, Feb 25, 2013 at 11:13:25AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-02-25:
 On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-02-25:
 On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote:
 Avi Kivity wrote on 2013-02-25:
 I didn't really follow, but is the root cause the need to keep track
 of interrupt coalescing?  If so we can recommend that users use
 KVM_IRQ_LINE when coalescing is unneeded, and move interrupt
 injection with irq coalescing support to vcpu context.
 So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when
 posted
 interrupt is enabled to force users doesn't to use
 KVM_IRQ_LINE_STATUS. Does this acceptable?
 
 The only case in KVM that need to know the interrupt injection status is
 vlapic
 timer. But since vlapic timer and vcpu are always in same pcpu, so
 there is no problem.
 
 Not really. The primary user of this interface is RTC interrupt
 re-injection for Windows guests.
 So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well?
 
 Windows guests may experience timedrift under CPU overcommit scenario.
 Ok, I see. Seems we are stuck. :(
 Do you have any suggestion to solve or workaround current problem?
 
 Depend on knowledge about atomicity (item 5 IIRC) of the sequence
 in the manual.
I am sure it is not atomic:
tmp_pir=xhcg(pir, 0)
irr |= tmp_pir

Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/15] target-ppc: Refactor debug output macros

2013-02-25 Thread Alexander Graf

On 21.02.2013, at 05:25, Andreas Färber wrote:

 Make debug output compile-testable even if disabled.
 
 Rename dprintf() in kvm.c to kvm_dprintf() to avoid conflict with glibc.
 
 Inline DEBUG_OP check in excp_helper.c.
 Inline LOG_MMU_STATE() in mmu_helper.c.
 Inline PPC_{DEBUG_SPR,DUMP_SPR_ACCESSES} checks in translate_init.c.
 
 Signed-off-by: Andreas Färber afaer...@suse.de

I assume you verified that all the bits do get optimized out, right? :)

Reviewed-by: Alexander Graf ag...@suse.de


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] target-s390x: Refactor debug output macros

2013-02-25 Thread Alexander Graf

On 21.02.2013, at 05:25, Andreas Färber wrote:

 Make debug output compile-testable even if disabled.
 
 Rename dprintf() in kvm.c to kvm_dprintf() due to a conflict with glibc.
 
 Drop unused DEBUG_HELPER and LOG_HELPER() in fpu_helper.c.
 Drop unused LOG_DISAS() in translate.c and inline S390X_DEBUG_DISAS.
 
 Signed-off-by: Andreas Färber afaer...@suse.de

Reviewed-by: Alexander Graf ag...@suse.de


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/6] kvm: add device control API

2013-02-25 Thread Gleb Natapov
On Mon, Feb 25, 2013 at 12:11:19PM +1100, Paul Mackerras wrote:
 On Mon, Feb 18, 2013 at 02:21:59PM +0200, Gleb Natapov wrote:
  Copying Christoffer since ARM has in kernel irq chip too.
  
  On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote:
   Currently, devices that are emulated inside KVM are configured in a
   hardcoded manner based on an assumption that any given architecture
   only has one way to do it.  If there's any need to access device state,
   it is done through inflexible one-purpose-only IOCTLs (e.g.
   KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
   cumbersome and depletes a limited numberspace.
   
   This API provides a mechanism to instantiate a device of a certain
   type, returning an ID that can be used to set/get attributes of the
   device.  Attributes may include configuration parameters (e.g.
   register base address), device state, operational commands, etc.  It
   is similar to the ONE_REG API, except that it acts on devices rather
   than vcpus.
  You are not only provide different way to create in kernel irq chip you
  also use an alternate way to trigger interrupt lines. Before going into
  interface specifics lets think about whether it is really worth it? x86
  obviously support old way and will have to for some, very long, time.
  ARM vGIC code, that is ready to go upstream, uses old way too. So it will
  be 2 archs against one. Christoffer do you think the proposed way it
  better for your needs. Are you willing to make vGIC use it?
 
 In fact there have been two distinct interrupt controller emulations
 for PPC posted recently, Scott's and mine, with quite different
 interfaces.  In contrast to Scott's device control API, where the
 operations you would do for different interrupt controllers are quite
 different, mine attempted to define a much more abstract interface for
 interrupt controller emulations, with the idea that an interrupt
 controller subsystem connects a set of interrupt sources, each with
 some state, to a set of interrupt delivery mechanisms, one per vcpu,
 also with some state.
 
I agree with Scott that it is futile to try and come up with generic
irqchip configuration interface and I doubt it is needed from QEMU
or other userspace pov. I looked at proposed KVM_IRQCHIP_SET_SOURCES
interface and while it is possible to pass some information about
pic/ioapic using it there will be a lot of information that will not
fit there. For one there is global irqchips related state and proposed
interface only talk about interrupt sources. Another is that using
generic interface will require us to have a code that translate irqchip
representation into this generic one and back for no apparent gain.
Currently pic/ioapic state is very similar to what HW specification
defines and it is not going to change.

Looking at your implementation of KVM_IRQCHIP_SET_SOURCES I wounder how
well it work for migration though. The interface suppose to transfer
irqchips state as is, but I see things like that in your code:

   mutex_lock(ics-lock);
   irqp-server = val  KVM_IRQ_SERVER_MASK;
   irqp-priority = val  KVM_IRQ_PRIORITY_SHIFT;
   irqp-resend = 0;
   irqp-masked_pending = 0;
   irqp-asserted = 0;

Why it is safe to initialize those values to default values during
migration? Wouldn't it be simpler and more correct to just transfer
the whole content of irqp from src to dst?

 Thus my interface had:
 
 - a KVM_CREATE_IRQCHIP_ARGS ioctl, with an argument structure that
   indicates the overall architecture of the interrupt subsystem,
 
 - KVM_IRQCHIP_SET_SOURCES and KVM_IRQCHIP_GET_SOURCES ioctls, that
   return or modify the state of some set of interrupt sources
 
 - a KVM_REG_PPC_ICP_STATE identifier in the ONE_REG register
   identifier space, that is used to save and restore the per-vcpu
   interrupt presentation state.
 
 Alternatively, I could modify my code to use the existing
 KVM_CREATE_IRQCHIP, KVM_GET_IRQCHIP, and KVM_SET_IRQCHIP ioctls.  If I
 were to do that I would want to rename the 'pad' field in struct
 kvm_irqchip to 'nr' and use it with 'chip_id' to identify a range of
 interrupt sources to be affected.  I'd also want to keep the ONE_REG
 identifier for the per-vcpu state.
It is preferable to the interface you propose since I do not think your
interface fits other interrupt controllers well. You can put nr field
into dummy[] payload, or replace pad with union {pad, nr}.

 
 Or I could change over to using Scott's device control API, though I
 have some objections to doing that.
 
 What is your advice about which interface to use?
 
The ideal situation would be if you and Scott agree on the details about
the interface. If you don't like something about Scott's interface we
can discuss it and shape it to something you agree with or even like.
I already asked Scott to introduce command interface. Scott does not

Re: [PATCH v2 12/15] target-ppc: Refactor debug output macros

2013-02-25 Thread Andreas Färber
Am 25.02.2013 13:54, schrieb Alexander Graf:
 
 On 21.02.2013, at 05:25, Andreas Färber wrote:
 
 Make debug output compile-testable even if disabled.

 Rename dprintf() in kvm.c to kvm_dprintf() to avoid conflict with glibc.

 Inline DEBUG_OP check in excp_helper.c.
 Inline LOG_MMU_STATE() in mmu_helper.c.
 Inline PPC_{DEBUG_SPR,DUMP_SPR_ACCESSES} checks in translate_init.c.

 Signed-off-by: Andreas Färber afaer...@suse.de
 
 I assume you verified that all the bits do get optimized out, right? :)

No, I didn't for each, my focus was to make debug code compile and keep
it compiling after my CPU changes. :)

Please read up on the new discussion of rth not liking static const and
proposing to go back to v1, modulo do { ... } while (0).

Like I said there, if finding a solution that pleases everyone fails,
then I will leave it to the maintainers (i.e., you) to choose and apply
a solution or to live with the resulting breakages.

Andreas

 
 Reviewed-by: Alexander Graf ag...@suse.de
 
 
 Alex
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ppc-next v2] target-ppc: Make host CPU a subclass of the host's CPU model

2013-02-25 Thread Alexander Graf

On 23.02.2013, at 22:22, Andreas Färber wrote:

 This avoids assigning individual class fields and contributors
 forgetting to add field assignments in KVM-only code.
 
 ppc_cpu_class_find_by_pvr() requires the CPU model classes to be
 registered, so defer host CPU type registration to kvm_arch_init().
 
 Only register the host CPU type if there is a class with matching PVR.
 This lets us drop error handling from instance_init.
 
 Signed-off-by: Andreas Färber afaer...@suse.de

Thanks, applied to ppc-next.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Gleb Natapov
On Mon, Feb 25, 2013 at 11:13:25AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-02-25:
  On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-02-25:
  On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote:
  Avi Kivity wrote on 2013-02-25:
  I didn't really follow, but is the root cause the need to keep track
  of interrupt coalescing?  If so we can recommend that users use
  KVM_IRQ_LINE when coalescing is unneeded, and move interrupt
  injection with irq coalescing support to vcpu context.
  So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted
  interrupt is enabled to force users doesn't to use
  KVM_IRQ_LINE_STATUS. Does this acceptable?
  
  The only case in KVM that need to know the interrupt injection status is
  vlapic
  timer. But since vlapic timer and vcpu are always in same pcpu, so there 
  is no
  problem.
  
  Not really. The primary user of this interface is RTC interrupt
  re-injection for Windows guests.
  So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well?
  
  Windows guests may experience timedrift under CPU overcommit scenario.
 Ok, I see. Seems we are stuck. :(
 Do you have any suggestion to solve or workaround current problem?
 
I see a couple of possible solutions:
1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons:
current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it
will be slow on newer kernels
2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not
running during injection. This assumes that if vcpu is running and does
not process interrupt it is guest fault and the same can happen on real
HW too. Coalescing when vcpu is not running though is the result of CPU
overcommit and should be reported. Cons interface definition is kind of
murky.
3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI
notifiers for interrupt reinjection. This requires us to add interface
for reporting EOI to userspace. This is not in the scope of this
patchset. Cons: need to introduce new interface (and the one that will
not work on AMD BTW)

Other ideas?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 54461] New: nVMX: Downgrading nested VMX features from user-space

2013-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=54461

   Summary: nVMX: Downgrading nested VMX features from user-space
   Product: Virtualization
   Version: unspecified
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: enhancement
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: n...@math.technion.ac.il
Regression: No


In nested SVM, the sub-features of SVM are all part of CPUID, so we already
have a user-space interface (KVM_GET_SUPPORTED_CPUID/KVM_SET_CPUID2) to set a
guest's features - which is, for example, useful for dictating a fixed set of
features across different hosts.

However, in nested VMX, the VMX sub-features offered or not to the guest are
not in CPUID, but rather in MSRs, and currently the
nested_vmx_setup_ctls_msrs() function determines their value - without any
option for user space to control this. We need some sort of new
KVM_SET_READONLY_MSRS ioctl for overriding  (downgrading)
nested_vmx_setup_ctls_msrs()'s decisions for a particular guest.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 53601] nVMX meta-bug

2013-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=53601


Nadav Har'El n...@math.technion.ac.il changed:

   What|Removed |Added

 Depends on||54461




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 54461] nVMX: Downgrading nested VMX features from user-space

2013-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=54461


Nadav Har'El n...@math.technion.ac.il changed:

   What|Removed |Added

 Blocks||53601




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/15] target-ppc: Refactor debug output macros

2013-02-25 Thread Alexander Graf

On 25.02.2013, at 14:14, Andreas Färber wrote:

 Am 25.02.2013 13:54, schrieb Alexander Graf:
 
 On 21.02.2013, at 05:25, Andreas Färber wrote:
 
 Make debug output compile-testable even if disabled.
 
 Rename dprintf() in kvm.c to kvm_dprintf() to avoid conflict with glibc.
 
 Inline DEBUG_OP check in excp_helper.c.
 Inline LOG_MMU_STATE() in mmu_helper.c.
 Inline PPC_{DEBUG_SPR,DUMP_SPR_ACCESSES} checks in translate_init.c.
 
 Signed-off-by: Andreas Färber afaer...@suse.de
 
 I assume you verified that all the bits do get optimized out, right? :)
 
 No, I didn't for each, my focus was to make debug code compile and keep
 it compiling after my CPU changes. :)
 
 Please read up on the new discussion of rth not liking static const and
 proposing to go back to v1, modulo do { ... } while (0).
 
 Like I said there, if finding a solution that pleases everyone fails,
 then I will leave it to the maintainers (i.e., you) to choose and apply
 a solution or to live with the resulting breakages.

I care about 2 things:

  * #ifdef works
  * if not defined then the code should get optimized out

:). Everything else is icing on top.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 53761] nVMX: MSR bitmap merging

2013-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=53761





--- Comment #1 from Nadav Har'El n...@math.technion.ac.il  2013-02-25 
13:59:58 ---
The same technique, and code, can be reused also for merging of IO bitmaps.
This will offer less of a performance impact than that on MSR bitmaps, because
the KVM L0 anyway wants exit on all IO ports (except the esoteric port 0x80
used for delays). So I am guessing no performance improvement will be seen
because of merging IO bitmaps.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Marcelo Tosatti
On Mon, Feb 25, 2013 at 06:55:58AM +, Zhang, Yang Z wrote:
  
  p1)
  
  cpu0  cpu1vcpu0
  test_and_set_bit(PIR-A)
  set KVM_REQ_EVENT
process REQ_EVENT
PIR-A-IRR
  
vcpu-mode=IN_GUEST
  
  if (vcpu0-guest_mode)
if (!t_a_s_bit(PIR notif))
send IPI
linux_pir_handler
  
t_a_s_b(PIR-B)=1
no PIR IPI sent
  
  p2)
  
  No, this exists with your implementation not mine.
  As I said, in this patch, it will make request after vcpu ==guest mode, 
  then send
  ipi. Even the ipi is lost, but the request still will be tracked. Because 
  we have the
  follow check:
  vcpu-mode = GUEST_MODE
  (ipi may arrived here and lost)
  local irq disable
  check request (this will ensure the pir modification will be picked up by 
  vcpu
  before vmentry)
  
  Please read the sequence above again, between p1) and p2). Yes, if the
  IPI is lost, the request _for the CPU which sent PIR IPI_ is guaranteed
  to be processed, but not the request for another cpu (cpu1).
 If no PIR IPI sent in cpu1, the delivered is false, and it will roll back to 
 old logic:
 __apic_accept_irq():
 if (!delivered) {
  kvm_make_request(KVM_REQ_EVENT, vcpu);
  kvm_vcpu_kick(vcpu);
 }
 This can solve the problem you mentioned above. Right?

Should not be doing this kick in the first place. One result of it is
that you'll always take a VM-exit if a second injection happens while a VCPU
has not handled the first one.

What is the drawback of the suggestion to unconditionally clear PIR
notification bit on VM-entry?

We can avoid it, but lets get it correct first  (BTW can stick a comment
saying FIXME: optimize) on that PIR clearing.

  cpu0
  
  check pir(pass)
  check irr(pass)
  injected = !test_and_set_bit(pir)
  
  versus
  
  cpu1
 xchg(pir)

  
  No information can be lost because all accesses to shared data are
  atomic.
 Sorry, I cannot get your point. Why 'xchg(pir)' can solve the problem I 
 mentioned above? Can you elaborate it?
 The spinlock I used is trying to protect the whole procedure of sync pir to 
 irr, not just access pir.

You're right, its the same problem as with the hardware update.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Marcelo Tosatti
On Mon, Feb 25, 2013 at 03:34:19PM +0200, Gleb Natapov wrote:
 On Mon, Feb 25, 2013 at 11:13:25AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-02-25:
   On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote:
   Gleb Natapov wrote on 2013-02-25:
   On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote:
   Avi Kivity wrote on 2013-02-25:
   I didn't really follow, but is the root cause the need to keep track
   of interrupt coalescing?  If so we can recommend that users use
   KVM_IRQ_LINE when coalescing is unneeded, and move interrupt
   injection with irq coalescing support to vcpu context.
   So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted
   interrupt is enabled to force users doesn't to use
   KVM_IRQ_LINE_STATUS. Does this acceptable?
   
   The only case in KVM that need to know the interrupt injection status 
   is
   vlapic
   timer. But since vlapic timer and vcpu are always in same pcpu, so 
   there is no
   problem.
   
   Not really. The primary user of this interface is RTC interrupt
   re-injection for Windows guests.
   So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well?
   
   Windows guests may experience timedrift under CPU overcommit scenario.
  Ok, I see. Seems we are stuck. :(
  Do you have any suggestion to solve or workaround current problem?
  
 I see a couple of possible solutions:
 1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons:
 current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it
 will be slow on newer kernels

Can add a capability to QEMU and enable APICv selectively only 
in newer QEMU, which can issue KVM_IRQ_LINE_STATUS on target vcpu
only when necessary (and KVM_IRQ_LINE otherwise).

Even a lock serializing injection is not safe because ON bit is cleared
before XCHG(PIR, 0). Must do something heavier (such as running on
target vcpu context).

 2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not
 running during injection. This assumes that if vcpu is running and does
 not process interrupt it is guest fault and the same can happen on real
 HW too. Coalescing when vcpu is not running though is the result of CPU
 overcommit and should be reported. Cons interface definition is kind of
 murky.
 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI
 notifiers for interrupt reinjection. This requires us to add interface
 for reporting EOI to userspace. This is not in the scope of this
 patchset. Cons: need to introduce new interface (and the one that will
 not work on AMD BTW)

Breaks older userspace?
 
 Other ideas?

Can HW write a 'finished' bit after 6 in the reserved area? Suppose its
not a KVM-specific problem?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 54471] New: nVMX: TRUE* MSRs unnecessary

2013-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=54471

   Summary: nVMX: TRUE* MSRs unnecessary
   Product: Virtualization
   Version: unspecified
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: enhancement
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: n...@math.technion.ac.il
Regression: No


According to the spec, the TRUE* MSRs are only necessary if bit 55 of VMX_BASIC
is on. Since we don't set this bit, we don't need to provide these MSRs...
So probably we shouldn't.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 53601] nVMX meta-bug

2013-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=53601


Nadav Har'El n...@math.technion.ac.il changed:

   What|Removed |Added

 Depends on||54471




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 54471] nVMX: TRUE* MSRs unnecessary

2013-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=54471


Nadav Har'El n...@math.technion.ac.il changed:

   What|Removed |Added

 Blocks||53601




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Marcelo Tosatti
On Mon, Feb 25, 2013 at 11:00:21AM -0300, Marcelo Tosatti wrote:
  I see a couple of possible solutions:
  1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons:
  current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it
  will be slow on newer kernels
 
 Can add a capability to QEMU and enable APICv selectively only 
 in newer QEMU, which can issue KVM_IRQ_LINE_STATUS on target vcpu
 only when necessary (and KVM_IRQ_LINE otherwise).

Bad idea. What happens with mixed scenarios.

 Even a lock serializing injection is not safe because ON bit is cleared
 before XCHG(PIR, 0). Must do something heavier (such as running on
 target vcpu context).

Note always running on target vcpu is likely to be slower than no-APICv.

So need to do something heavier on the kernel under serialization, if 
firmware cannot be changed (injection from simultaneous CPUs should be
rare so if data to serialize __accept_apic_irq is cache-line aligned 
it should reduce performance impact).

  2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not
  running during injection. This assumes that if vcpu is running and does
  not process interrupt it is guest fault and the same can happen on real
  HW too. Coalescing when vcpu is not running though is the result of CPU
  overcommit and should be reported. Cons interface definition is kind of
  murky.
  3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI
  notifiers for interrupt reinjection. This requires us to add interface
  for reporting EOI to userspace. This is not in the scope of this
  patchset. Cons: need to introduce new interface (and the one that will
  not work on AMD BTW)

Is there no int-ack notification at RTC HW level?

 Breaks older userspace?
  
  Other ideas?
 
 Can HW write a 'finished' bit after 6 in the reserved area? Suppose its
 not a KVM-specific problem?
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 54471] New: nVMX: TRUE* MSRs unnecessary

2013-02-25 Thread Jan Kiszka
On 2013-02-25 15:04, bugzilla-dae...@bugzilla.kernel.org wrote:
 https://bugzilla.kernel.org/show_bug.cgi?id=54471
 
Summary: nVMX: TRUE* MSRs unnecessary
Product: Virtualization
Version: unspecified
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: enhancement
   Priority: P1
  Component: kvm
 AssignedTo: virtualization_...@kernel-bugs.osdl.org
 ReportedBy: n...@math.technion.ac.il
 Regression: No
 
 
 According to the spec, the TRUE* MSRs are only necessary if bit 55 of 
 VMX_BASIC
 is on. Since we don't set this bit, we don't need to provide these MSRs...
 So probably we shouldn't.

I tend to say we should (i.e. set bit 55) as there is the risk that
buggy guests could needlessly get confused when we expose very recent
features (like EPT) and do not provide such old-fashioned stuff.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 54471] nVMX: TRUE* MSRs unnecessary

2013-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=54471





--- Comment #1 from Anonymous Emailer anonym...@kernel-bugs.osdl.org  
2013-02-25 14:29:08 ---
Reply-To: jan.kis...@siemens.com

On 2013-02-25 15:04, bugzilla-dae...@bugzilla.kernel.org wrote:
 https://bugzilla.kernel.org/show_bug.cgi?id=54471
 
Summary: nVMX: TRUE* MSRs unnecessary
Product: Virtualization
Version: unspecified
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: enhancement
   Priority: P1
  Component: kvm
 AssignedTo: virtualization_...@kernel-bugs.osdl.org
 ReportedBy: n...@math.technion.ac.il
 Regression: No
 
 
 According to the spec, the TRUE* MSRs are only necessary if bit 55 of 
 VMX_BASIC
 is on. Since we don't set this bit, we don't need to provide these MSRs...
 So probably we shouldn't.

I tend to say we should (i.e. set bit 55) as there is the risk that
buggy guests could needlessly get confused when we expose very recent
features (like EPT) and do not provide such old-fashioned stuff.

Jan

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Zhang, Yang Z
Marcelo Tosatti wrote on 2013-02-25:
 On Mon, Feb 25, 2013 at 06:55:58AM +, Zhang, Yang Z wrote:
 
 p1)
 
 cpu0  cpu1vcpu0
 test_and_set_bit(PIR-A)
 set KVM_REQ_EVENT
   process REQ_EVENT
   PIR-A-IRR
 
   vcpu-mode=IN_GUEST
 
 if (vcpu0-guest_mode)
   if (!t_a_s_bit(PIR notif))
   send IPI
   linux_pir_handler
 
   t_a_s_b(PIR-B)=1
   no PIR IPI sent
 
 p2)
 
 No, this exists with your implementation not mine.
 As I said, in this patch, it will make request after vcpu ==guest mode, 
 then
 send
 ipi. Even the ipi is lost, but the request still will be tracked.
 Because we have the follow check:
 vcpu-mode = GUEST_MODE
 (ipi may arrived here and lost)
 local irq disable
 check request (this will ensure the pir modification will be picked up by 
 vcpu
 before vmentry)
 
 Please read the sequence above again, between p1) and p2). Yes, if the
 IPI is lost, the request _for the CPU which sent PIR IPI_ is guaranteed
 to be processed, but not the request for another cpu (cpu1).
 If no PIR IPI sent in cpu1, the delivered is false, and it will roll back to 
 old logic:
 __apic_accept_irq():
 if (!delivered) {
  kvm_make_request(KVM_REQ_EVENT, vcpu);
  kvm_vcpu_kick(vcpu);
 }
 This can solve the problem you mentioned above. Right?
 
 Should not be doing this kick in the first place. One result of it is
 that you'll always take a VM-exit if a second injection happens while a VCPU
 has not handled the first one.
You are right. 
 
 What is the drawback of the suggestion to unconditionally clear PIR
 notification bit on VM-entry?
The only thing is we need to traverse the whole pir when calling 
sync_pir_to_irr even the pir is empty.


 We can avoid it, but lets get it correct first  (BTW can stick a comment
 saying FIXME: optimize) on that PIR clearing.
Ok, I will adopt your suggestion. BTW, Where should the comment be add? on 
sync_pir_to_irr()?

 
 cpu0
 
 check pir(pass)
 check irr(pass)
 injected = !test_and_set_bit(pir)
 
 versus
 
 cpu1
 xchg(pir)
 
 
 No information can be lost because all accesses to shared data are
 atomic.
 Sorry, I cannot get your point. Why 'xchg(pir)' can solve the problem I
 mentioned above? Can you elaborate it? The spinlock I used is trying to
 protect the whole procedure of sync pir to irr,
 not just access pir.
 
 You're right, its the same problem as with the hardware update.


Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/6] kvm: add device control API

2013-02-25 Thread Alexander Graf

On 24.02.2013, at 16:46, Gleb Natapov wrote:

 On Thu, Feb 21, 2013 at 08:17:54PM -0600, Scott Wood wrote:
 On 02/21/2013 02:22:09 AM, Gleb Natapov wrote:
 On Wed, Feb 20, 2013 at 08:05:12PM -0600, Scott Wood wrote:
 On 02/20/2013 07:09:49 AM, Gleb Natapov wrote:
 On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote:
 On 02/19/2013 06:24:18 AM, Gleb Natapov wrote:
 On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote:
 The ability to set/get attributes is needed.  Sorry, but get
 or set
 one blob of data, up to 512 bytes, for the entire irqchip is
 just
 not good enough -- assuming you don't want us to start
 sticking
 pointers and commands in *that* data. :-)
 
 Proposed interface sticks pointers into ioctl data, so why doing
 the same
 for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile.
 
 There's a difference between putting a pointer in an ioctl
 control
 structure that is specifically documented as being that way
 (as in
 ONE_REG), versus taking an ioctl that claims to be
 setting/getting a
 blob of state and embedding pointers in it.  It would be like
 sticking a pointer in the attribute payload of this API, which I
 think is something to be discouraged.
 If documentation is what differentiate for you between silly
 and smart
 then write documentation instead of new interfaces.
 
 You mean like what we did with SREGS, that got deprecated and
 replaced with ONE_REG?
 
 SREGS is implemented by ppc. I see ONE_REG as addition to REGS
 interface. You can access all register at once or you can access them
 one by one. If there is a need we can add MULTIPLE_REGS that will get
 list of requested REGS.
 
 http://www.spinics.net/lists/kvm-ppc/msg04876.html
 http://www.spinics.net/lists/kvm-ppc/msg05842.html
 
 If Alex prefers ONE_REG interface this is his call :)

There's a reason I prefer it :). We ran out of space in the SREGS call, we had 
a bit map with multiple chunks of data you could set in there and worst of all, 
when we ran out of space we didn't realize it until the code was already in 
Linus' tree!

The normal here's a chunk of data ioctl style works great when you know all 
or most of the details of what you want to transfer. In the case of kvm, we 
need to expose an interface we

  a) Don't control itself. Hardware guys make hardware. We just follow.
  b) Is pretty complex. There can be a lot of state in a CPU / device.

So because if that we needed something more flexible and dynamic. Something 
where we could add registers we forgot without jumping through lots of hoops.

That's why I prefer ONE_REG for vcpus. Since I see the same danger on device 
emulation, I tend to prefer it here as well.

 
 The interface is not over generic i.e it does
 not try to replace KVM_RUN by writing special register.
 
 Sigh.
 
   And we
 don't necessarily want to set *everything*.
 What are those cases? You do need to on reset/migration.
 
 Why do we want to set all the registers on reset, rather than tell
 the in-kernel device to reset?  The default state came from the
 kernel in the first place on irqchip creation...
 
 I have nothing against telling in-kernel device to reset provided there
 is a way to do so, which current interface lacks. Reset in userspase has
 its advantage too: bugs are easier to fix, there may be different kind
 of resets (hard/soft).

The same argument holds for state access. If you suddenly realize oh, I need 
to also set this one hidden state bit in my hardware you only want to add that 
specific write on reset.

If you suddenly find a race condition in migration that happens very rarely, 
you want to add the one register that you forgot to synchronize in addition in 
your new code. You don't want to change the whole ioctl passed struct.

That's why we have the padding in our ioctl structs. And feature bitmaps. And 
once we exceed them, we're screwed.

I want to make sure we don't hit that point.

Actually, there's a really easy way to understand what this interface does. It 
basically implements a state get/set ioctl which allows for very fine grained 
control over your ioctl struct. It does that by only passing one register at a 
time.

If we run into concurrency issues (multiple registers need to be set in a 
locked manner at the same time), we can always add a new ioctl that sets 
multiple attrs/regs at once. For CPUs, we haven't hit that case yet.

 
   It'd also be using
 KVM_SET_IRQCHIP to read data, which is the sort of thing you
 object
 to later on regarding KVM_IRQ_LINE_STATUS.
 
 Do not see why.
 
 It's either that, or have the data direction of the chip field in
 KVM_GET_IRQCHIP not be entirely in the get direction.
 
 Still do not follow. Example?
 
 struct kvm_irqchip has chip_id, pad, and chip.  pad is
 insufficient to communicate attribute type plus a pointer.  So if we
 want to provide a pointer for the kernel to write the attribute
 into, it has to read from memory that 

[PATCH v3 0/5] kvm: Make ioeventfd usable on s390.

2013-02-25 Thread Cornelia Huck
Here's the latest version of my patch series enabling ioeventfds
on s390, again against kvm-next.

Patches 1 and 2 (cleaning up initialization and exporting the virtio-ccw
api) would make sense even independent of the ioeventfd enhancements.

Patches 3-5 are concerned with adding a new type of ioeventfds for
virtio-ccw notifications on s390. The naming is now hopefully clearer.
We won't add ioeventfd support for the legacy s390-virtio transport.

Please consider applying.

v2 - v3:
- Added a patch exporting the virtio-ccw api and use it for the
  diagnose implementation.
- Better naming: We're dealing with virtio-ccw notifications only.
v1 - v2:
- Move irqfd initialization from a module init function to kvm_init,
  eliminating the need for a second module for kvm/s390.
- Use kvm_io_device for s390 css devices.

Cornelia Huck (5):
  KVM: Initialize irqfd from kvm_init().
  KVM: s390: Export virtio-ccw api.
  KVM: Introduce KVM_VIRTIO_CCW_NOTIFY_BUS.
  KVM: ioeventfd for virtio-ccw devices.
  KVM: s390: Wire up ioeventfd.

 Documentation/virtual/kvm/api.txt   |  8 
 arch/s390/include/uapi/asm/Kbuild   |  1 +
 arch/s390/include/uapi/asm/virtio-ccw.h | 21 +
 arch/s390/kvm/Kconfig   |  1 +
 arch/s390/kvm/Makefile  |  2 +-
 arch/s390/kvm/diag.c| 26 ++
 arch/s390/kvm/kvm-s390.c|  1 +
 drivers/s390/kvm/virtio_ccw.c   |  5 +
 include/linux/kvm_host.h| 14 ++
 include/uapi/linux/kvm.h|  3 +++
 virt/kvm/eventfd.c  | 21 ++---
 virt/kvm/kvm_main.c |  6 ++
 12 files changed, 97 insertions(+), 12 deletions(-)
 create mode 100644 arch/s390/include/uapi/asm/virtio-ccw.h

-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/5] KVM: Initialize irqfd from kvm_init().

2013-02-25 Thread Cornelia Huck
Currently, eventfd introduces module_init/module_exit functions
to initialize/cleanup the irqfd workqueue. This only works, however,
if no other module_init/module_exit functions are built into the
same module.

Let's just move the initialization and cleanup to kvm_init and kvm_exit.
This way, it is also clearer where kvm startup may fail.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 include/linux/kvm_host.h | 13 +
 virt/kvm/eventfd.c   |  7 ++-
 virt/kvm/kvm_main.c  |  6 ++
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 722cae7..3b768ef 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -423,6 +423,19 @@ void kvm_vcpu_uninit(struct kvm_vcpu *vcpu);
 int __must_check vcpu_load(struct kvm_vcpu *vcpu);
 void vcpu_put(struct kvm_vcpu *vcpu);
 
+#ifdef __KVM_HAVE_IOAPIC
+int kvm_irqfd_init(void);
+void kvm_irqfd_exit(void);
+#else
+static inline int kvm_irqfd_init(void)
+{
+   return 0;
+}
+
+static inline void kvm_irqfd_exit(void)
+{
+}
+#endif
 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
  struct module *module);
 void kvm_exit(void);
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index b6eea5c..f0ced1a 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -544,7 +544,7 @@ void kvm_irq_routing_update(struct kvm *kvm,
  * aggregated from all vm* instances. We need our own isolated single-thread
  * queue to prevent deadlock against flushing the normal work-queue.
  */
-static int __init irqfd_module_init(void)
+int kvm_irqfd_init(void)
 {
irqfd_cleanup_wq = create_singlethread_workqueue(kvm-irqfd-cleanup);
if (!irqfd_cleanup_wq)
@@ -553,13 +553,10 @@ static int __init irqfd_module_init(void)
return 0;
 }
 
-static void __exit irqfd_module_exit(void)
+void kvm_irqfd_exit(void)
 {
destroy_workqueue(irqfd_cleanup_wq);
 }
-
-module_init(irqfd_module_init);
-module_exit(irqfd_module_exit);
 #endif
 
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index adc68fe..7c188a3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2920,6 +2920,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned 
vcpu_align,
int r;
int cpu;
 
+   r = kvm_irqfd_init();
+   if (r)
+   goto out_irqfd;
r = kvm_arch_init(opaque);
if (r)
goto out_fail;
@@ -3000,6 +3003,8 @@ out_free_0a:
 out_free_0:
kvm_arch_exit();
 out_fail:
+   kvm_irqfd_exit();
+out_irqfd:
return r;
 }
 EXPORT_SYMBOL_GPL(kvm_init);
@@ -3016,6 +3021,7 @@ void kvm_exit(void)
on_each_cpu(hardware_disable_nolock, NULL, 1);
kvm_arch_hardware_unsetup();
kvm_arch_exit();
+   kvm_irqfd_exit();
free_cpumask_var(cpus_hardware_enabled);
 }
 EXPORT_SYMBOL_GPL(kvm_exit);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/5] KVM: Introduce KVM_VIRTIO_CCW_NOTIFY_BUS.

2013-02-25 Thread Cornelia Huck
Add a new bus type for virtio-ccw devices on s390.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 include/linux/kvm_host.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3b768ef..206247f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -148,6 +148,7 @@ struct kvm_io_bus {
 enum kvm_bus {
KVM_MMIO_BUS,
KVM_PIO_BUS,
+   KVM_VIRTIO_CCW_NOTIFY_BUS,
KVM_NR_BUSES
 };
 
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/5] KVM: ioeventfd for virtio-ccw devices.

2013-02-25 Thread Cornelia Huck
Enhance KVM_IOEVENTFD with a new flag that allows to attach to virtio-ccw
devices on s390 via the KVM_VIRTIO_CCW_NOTIFY_BUS.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt |  8 
 include/uapi/linux/kvm.h  |  3 +++
 virt/kvm/eventfd.c| 14 --
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index c2534c3..86232d6 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1468,15 +1468,23 @@ struct kvm_ioeventfd {
__u8  pad[36];
 };
 
+For the special case of virtio-ccw devices on s390, the ioevent is matched
+to a subchannel/virtqueue tuple instead.
+
 The following flags are defined:
 
 #define KVM_IOEVENTFD_FLAG_DATAMATCH (1  kvm_ioeventfd_flag_nr_datamatch)
 #define KVM_IOEVENTFD_FLAG_PIO   (1  kvm_ioeventfd_flag_nr_pio)
 #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1  kvm_ioeventfd_flag_nr_deassign)
+#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
+   (1  kvm_ioeventfd_flag_nr_virtio_ccw_notify)
 
 If datamatch flag is set, the event will be signaled only if the written value
 to the registered address is equal to datamatch in struct kvm_ioeventfd.
 
+For virtio-ccw devices, addr contains the subchannel id and datamatch the
+virtqueue index.
+
 
 4.60 KVM_DIRTY_TLB
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9a2db57..8f3e5ae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -448,12 +448,15 @@ enum {
kvm_ioeventfd_flag_nr_datamatch,
kvm_ioeventfd_flag_nr_pio,
kvm_ioeventfd_flag_nr_deassign,
+   kvm_ioeventfd_flag_nr_virtio_ccw_notify,
kvm_ioeventfd_flag_nr_max,
 };
 
 #define KVM_IOEVENTFD_FLAG_DATAMATCH (1  kvm_ioeventfd_flag_nr_datamatch)
 #define KVM_IOEVENTFD_FLAG_PIO   (1  kvm_ioeventfd_flag_nr_pio)
 #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1  kvm_ioeventfd_flag_nr_deassign)
+#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
+   (1  kvm_ioeventfd_flag_nr_virtio_ccw_notify)
 
 #define KVM_IOEVENTFD_VALID_FLAG_MASK  ((1  kvm_ioeventfd_flag_nr_max) - 1)
 
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index f0ced1a..8de3cd7 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -679,11 +679,16 @@ static int
 kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 {
int   pio = args-flags  KVM_IOEVENTFD_FLAG_PIO;
-   enum kvm_bus  bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS;
+   int   ccw;
+   enum kvm_bus  bus_idx;
struct _ioeventfd*p;
struct eventfd_ctx   *eventfd;
int   ret;
 
+   ccw = args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY;
+   bus_idx = pio ? KVM_PIO_BUS :
+   ccw ? KVM_VIRTIO_CCW_NOTIFY_BUS :
+   KVM_MMIO_BUS;
/* must be natural-word sized */
switch (args-len) {
case 1:
@@ -759,11 +764,16 @@ static int
 kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 {
int   pio = args-flags  KVM_IOEVENTFD_FLAG_PIO;
-   enum kvm_bus  bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS;
+   int   ccw;
+   enum kvm_bus  bus_idx;
struct _ioeventfd*p, *tmp;
struct eventfd_ctx   *eventfd;
int   ret = -ENOENT;
 
+   ccw = args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY;
+   bus_idx = pio ? KVM_PIO_BUS :
+   ccw ? KVM_VIRTIO_CCW_NOTIFY_BUS :
+   KVM_MMIO_BUS;
eventfd = eventfd_ctx_fdget(args-fd);
if (IS_ERR(eventfd))
return PTR_ERR(eventfd);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/5] KVM: s390: Wire up ioeventfd.

2013-02-25 Thread Cornelia Huck
Enable ioeventfd support on s390 and hook up diagnose 500 virtio-ccw
notifications.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/kvm/Kconfig|  1 +
 arch/s390/kvm/Makefile   |  2 +-
 arch/s390/kvm/diag.c | 26 ++
 arch/s390/kvm/kvm-s390.c |  1 +
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index b58dd86..3c43e30 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -22,6 +22,7 @@ config KVM
select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_CPU_RELAX_INTERCEPT
+   select HAVE_KVM_EVENTFD
---help---
  Support hosting paravirtualized guest machines using the SIE
  virtualization capability on the mainframe. This should work
diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
index 3975722..8fe9d65 100644
--- a/arch/s390/kvm/Makefile
+++ b/arch/s390/kvm/Makefile
@@ -6,7 +6,7 @@
 # it under the terms of the GNU General Public License (version 2 only)
 # as published by the Free Software Foundation.
 
-common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o)
+common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o eventfd.o)
 
 ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
 
diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index a390687..96907f2 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -13,6 +13,7 @@
 
 #include linux/kvm.h
 #include linux/kvm_host.h
+#include asm/virtio-ccw.h
 #include kvm-s390.h
 #include trace.h
 #include trace-s390.h
@@ -104,6 +105,29 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
return -EREMOTE;
 }
 
+static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
+{
+   int ret, idx;
+
+   /* No virtio-ccw notification? Get out quickly. */
+   if (!vcpu-kvm-arch.css_support ||
+   (vcpu-run-s.regs.gprs[1] != KVM_S390_VIRTIO_CCW_NOTIFY))
+   return -EOPNOTSUPP;
+
+   idx = srcu_read_lock(vcpu-kvm-srcu);
+   /*
+* The layout is as follows:
+* - gpr 2 contains the subchannel id (passed as addr)
+* - gpr 3 contains the virtqueue index (passed as datamatch)
+*/
+   ret = kvm_io_bus_write(vcpu-kvm, KVM_VIRTIO_CCW_NOTIFY_BUS,
+   vcpu-run-s.regs.gprs[2],
+   8, vcpu-run-s.regs.gprs[3]);
+   srcu_read_unlock(vcpu-kvm-srcu, idx);
+   /* kvm_io_bus_write returns -EOPNOTSUPP if it found no match. */
+   return ret;
+}
+
 int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
 {
int code = (vcpu-arch.sie_block-ipb  0xfff)  16;
@@ -118,6 +142,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
return __diag_time_slice_end_directed(vcpu);
case 0x308:
return __diag_ipl_functions(vcpu);
+   case 0x500:
+   return __diag_virtio_hypercall(vcpu);
default:
return -EOPNOTSUPP;
}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index f822d36..04d2454 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -142,6 +142,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_ONE_REG:
case KVM_CAP_ENABLE_CAP:
case KVM_CAP_S390_CSS_SUPPORT:
+   case KVM_CAP_IOEVENTFD:
r = 1;
break;
case KVM_CAP_NR_VCPUS:
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] qemu: ioeventfd for virtio-ccw.

2013-02-25 Thread Cornelia Huck
Here's the current version of my patch enabling ioeventfds for virtio-ccw.

The linux-headers patch only adds the new flag for ioeventfds.

Wiring up ioeventfds has improved performance drastically (up to twice
as fast for a simple dd). Notifier support is something we plan to do
next.

Please consider applying if this makes sense.

v2 - v3:
- Adapt to changed ioeventfd interface again.
v1 - v2:
- Adapt to changed ioeventfd interface.

Cornelia Huck (2):
  linux-headers: Update with ioeventfd changes.
  virtio-ccw: Wire up ioeventfd.

 hw/s390x/css.c|   2 +-
 hw/s390x/css.h|   1 +
 hw/s390x/virtio-ccw.c | 114 ++
 hw/s390x/virtio-ccw.h |   7 +++
 linux-headers/linux/kvm.h |   3 ++
 target-s390x/cpu.h|  16 +++
 target-s390x/kvm.c|  19 
 7 files changed, 161 insertions(+), 1 deletion(-)

-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] linux-headers: Update with ioeventfd changes.

2013-02-25 Thread Cornelia Huck
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 linux-headers/linux/kvm.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 5af9357..fc3d7ed 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -448,12 +448,15 @@ enum {
kvm_ioeventfd_flag_nr_datamatch,
kvm_ioeventfd_flag_nr_pio,
kvm_ioeventfd_flag_nr_deassign,
+   kvm_ioeventfd_flag_nr_virtio_ccw_notify,
kvm_ioeventfd_flag_nr_max,
 };
 
 #define KVM_IOEVENTFD_FLAG_DATAMATCH (1  kvm_ioeventfd_flag_nr_datamatch)
 #define KVM_IOEVENTFD_FLAG_PIO   (1  kvm_ioeventfd_flag_nr_pio)
 #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1  kvm_ioeventfd_flag_nr_deassign)
+#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
+   (1  kvm_ioeventfd_flag_nr_virtio_ccw_notify)
 
 #define KVM_IOEVENTFD_VALID_FLAG_MASK  ((1  kvm_ioeventfd_flag_nr_max) - 1)
 
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] virtio-ccw: Wire up ioeventfd.

2013-02-25 Thread Cornelia Huck
On hosts that support ioeventfd, make use of it for host-to-guest
notifications via diagnose 500.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390x/css.c|   2 +-
 hw/s390x/css.h|   1 +
 hw/s390x/virtio-ccw.c | 114 ++
 hw/s390x/virtio-ccw.h |   7 
 target-s390x/cpu.h|  16 +++
 target-s390x/kvm.c|  19 +
 6 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 85f6f22..82e6746 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -68,7 +68,7 @@ int css_create_css_image(uint8_t cssid, bool default_image)
 return 0;
 }
 
-static uint16_t css_build_subchannel_id(SubchDev *sch)
+uint16_t css_build_subchannel_id(SubchDev *sch)
 {
 if (channel_subsys-max_cssid  0) {
 return (sch-cssid  8) | (1  3) | (sch-ssid  1) | 1;
diff --git a/hw/s390x/css.h b/hw/s390x/css.h
index 85ed05d..b536ab5 100644
--- a/hw/s390x/css.h
+++ b/hw/s390x/css.h
@@ -90,6 +90,7 @@ bool css_devno_used(uint8_t cssid, uint8_t ssid, uint16_t 
devno);
 void css_subch_assign(uint8_t cssid, uint8_t ssid, uint16_t schid,
   uint16_t devno, SubchDev *sch);
 void css_sch_build_virtual_schib(SubchDev *sch, uint8_t chpid, uint8_t type);
+uint16_t css_build_subchannel_id(SubchDev *sch);
 void css_reset(void);
 void css_reset_sch(SubchDev *sch);
 void css_queue_crw(uint8_t rsc, uint8_t erc, int chain, uint16_t rsid);
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index d92e427..23b8092 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -63,6 +63,84 @@ VirtIODevice *virtio_ccw_get_vdev(SubchDev *sch)
 return vdev;
 }
 
+static int virtio_ccw_set_guest2host_notifier(VirtioCcwDevice *dev, int n,
+  bool assign, bool set_handler)
+{
+VirtQueue *vq = virtio_get_queue(dev-vdev, n);
+EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
+int r = 0;
+SubchDev *sch = dev-sch;
+uint32_t sch_id = (css_build_subchannel_id(sch)  16) | sch-schid;
+
+if (assign) {
+r = event_notifier_init(notifier, 1);
+if (r  0) {
+error_report(%s: unable to init event notifier: %d, __func__, r);
+return r;
+}
+virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler);
+s390_assign_subch_ioeventfd(event_notifier_get_fd(notifier), sch_id,
+n, assign);
+} else {
+virtio_queue_set_host_notifier_fd_handler(vq, false, false);
+s390_assign_subch_ioeventfd(event_notifier_get_fd(notifier), sch_id,
+n, assign);
+event_notifier_cleanup(notifier);
+}
+return r;
+}
+
+static void virtio_ccw_start_ioeventfd(VirtioCcwDevice *dev)
+{
+int n, r;
+
+if (!(dev-flags  VIRTIO_CCW_FLAG_USE_IOEVENTFD) ||
+dev-ioeventfd_started) {
+return;
+}
+for (n = 0; n  VIRTIO_PCI_QUEUE_MAX; n++) {
+if (!virtio_queue_get_num(dev-vdev, n)) {
+continue;
+}
+r = virtio_ccw_set_guest2host_notifier(dev, n, true, true);
+if (r  0) {
+goto assign_error;
+}
+}
+dev-ioeventfd_started = true;
+return;
+
+  assign_error:
+while (--n = 0) {
+if (!virtio_queue_get_num(dev-vdev, n)) {
+continue;
+}
+r = virtio_ccw_set_guest2host_notifier(dev, n, false, false);
+assert(r = 0);
+}
+dev-ioeventfd_started = false;
+/* Disable ioeventfd for this device. */
+dev-flags = ~VIRTIO_CCW_FLAG_USE_IOEVENTFD;
+error_report(%s: failed. Fallback to userspace (slower)., __func__);
+}
+
+static void virtio_ccw_stop_ioeventfd(VirtioCcwDevice *dev)
+{
+int n, r;
+
+if (!dev-ioeventfd_started) {
+return;
+}
+for (n = 0; n  VIRTIO_PCI_QUEUE_MAX; n++) {
+if (!virtio_queue_get_num(dev-vdev, n)) {
+continue;
+}
+r = virtio_ccw_set_guest2host_notifier(dev, n, false, false);
+assert(r = 0);
+}
+dev-ioeventfd_started = false;
+}
+
 VirtualCssBus *virtual_css_bus_init(void)
 {
 VirtualCssBus *cbus;
@@ -187,6 +265,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 }
 break;
 case CCW_CMD_VDEV_RESET:
+virtio_ccw_stop_ioeventfd(dev);
 virtio_reset(dev-vdev);
 ret = 0;
 break;
@@ -313,10 +392,16 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 ret = -EFAULT;
 } else {
 status = ldub_phys(ccw.cda);
+if (!(status  VIRTIO_CONFIG_S_DRIVER_OK)) {
+virtio_ccw_stop_ioeventfd(dev);
+}
 virtio_set_status(dev-vdev, status);
 if (dev-vdev-status == 0) {
 virtio_reset(dev-vdev);
 }
+if (status  VIRTIO_CONFIG_S_DRIVER_OK) {
+virtio_ccw_start_ioeventfd(dev);

[PATCH v3 2/5] KVM: s390: Export virtio-ccw api.

2013-02-25 Thread Cornelia Huck
Export the virtio-ccw api in a header for usage by other code.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/include/uapi/asm/Kbuild   |  1 +
 arch/s390/include/uapi/asm/virtio-ccw.h | 21 +
 drivers/s390/kvm/virtio_ccw.c   |  5 +
 3 files changed, 23 insertions(+), 4 deletions(-)
 create mode 100644 arch/s390/include/uapi/asm/virtio-ccw.h

diff --git a/arch/s390/include/uapi/asm/Kbuild 
b/arch/s390/include/uapi/asm/Kbuild
index 7bf68ff..9ccd190 100644
--- a/arch/s390/include/uapi/asm/Kbuild
+++ b/arch/s390/include/uapi/asm/Kbuild
@@ -44,5 +44,6 @@ header-y += termios.h
 header-y += types.h
 header-y += ucontext.h
 header-y += unistd.h
+header-y += virtio-ccw.h
 header-y += vtoc.h
 header-y += zcrypt.h
diff --git a/arch/s390/include/uapi/asm/virtio-ccw.h 
b/arch/s390/include/uapi/asm/virtio-ccw.h
new file mode 100644
index 000..a9a4ebf
--- /dev/null
+++ b/arch/s390/include/uapi/asm/virtio-ccw.h
@@ -0,0 +1,21 @@
+/*
+ * Definitions for virtio-ccw devices.
+ *
+ * Copyright IBM Corp. 2013
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ *  Author(s): Cornelia Huck cornelia.h...@de.ibm.com
+ */
+#ifndef __KVM_VIRTIO_CCW_H
+#define __KVM_VIRTIO_CCW_H
+
+/* Alignment of vring buffers. */
+#define KVM_VIRTIO_CCW_RING_ALIGN 4096
+
+/* Subcode for diagnose 500 (virtio hypercall). */
+#define KVM_S390_VIRTIO_CCW_NOTIFY 3
+
+#endif
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 3217dfe..c5532b3 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -31,6 +31,7 @@
 #include asm/irq.h
 #include asm/cio.h
 #include asm/ccwdev.h
+#include asm/virtio-ccw.h
 
 /*
  * virtio related functions
@@ -79,10 +80,6 @@ struct virtio_ccw_vq_info {
struct list_head node;
 };
 
-#define KVM_VIRTIO_CCW_RING_ALIGN 4096
-
-#define KVM_S390_VIRTIO_CCW_NOTIFY 3
-
 #define CCW_CMD_SET_VQ 0x13
 #define CCW_CMD_VDEV_RESET 0x33
 #define CCW_CMD_SET_IND 0x43
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/6] kvm: add device control API

2013-02-25 Thread Alexander Graf

On 25.02.2013, at 14:09, Gleb Natapov wrote:

 On Mon, Feb 25, 2013 at 12:11:19PM +1100, Paul Mackerras wrote:
 On Mon, Feb 18, 2013 at 02:21:59PM +0200, Gleb Natapov wrote:
 Copying Christoffer since ARM has in kernel irq chip too.
 
 On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote:
 Currently, devices that are emulated inside KVM are configured in a
 hardcoded manner based on an assumption that any given architecture
 only has one way to do it.  If there's any need to access device state,
 it is done through inflexible one-purpose-only IOCTLs (e.g.
 KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
 cumbersome and depletes a limited numberspace.
 
 This API provides a mechanism to instantiate a device of a certain
 type, returning an ID that can be used to set/get attributes of the
 device.  Attributes may include configuration parameters (e.g.
 register base address), device state, operational commands, etc.  It
 is similar to the ONE_REG API, except that it acts on devices rather
 than vcpus.
 You are not only provide different way to create in kernel irq chip you
 also use an alternate way to trigger interrupt lines. Before going into
 interface specifics lets think about whether it is really worth it? x86
 obviously support old way and will have to for some, very long, time.
 ARM vGIC code, that is ready to go upstream, uses old way too. So it will
 be 2 archs against one. Christoffer do you think the proposed way it
 better for your needs. Are you willing to make vGIC use it?
 
 In fact there have been two distinct interrupt controller emulations
 for PPC posted recently, Scott's and mine, with quite different
 interfaces.  In contrast to Scott's device control API, where the
 operations you would do for different interrupt controllers are quite
 different, mine attempted to define a much more abstract interface for
 interrupt controller emulations, with the idea that an interrupt
 controller subsystem connects a set of interrupt sources, each with
 some state, to a set of interrupt delivery mechanisms, one per vcpu,
 also with some state.
 
 I agree with Scott that it is futile to try and come up with generic
 irqchip configuration interface and I doubt it is needed from QEMU
 or other userspace pov. I looked at proposed KVM_IRQCHIP_SET_SOURCES
 interface and while it is possible to pass some information about
 pic/ioapic using it there will be a lot of information that will not
 fit there. For one there is global irqchips related state and proposed
 interface only talk about interrupt sources. Another is that using
 generic interface will require us to have a code that translate irqchip
 representation into this generic one and back for no apparent gain.
 Currently pic/ioapic state is very similar to what HW specification
 defines and it is not going to change.
 
 Looking at your implementation of KVM_IRQCHIP_SET_SOURCES I wounder how
 well it work for migration though. The interface suppose to transfer
 irqchips state as is, but I see things like that in your code:
 
   mutex_lock(ics-lock);
   irqp-server = val  KVM_IRQ_SERVER_MASK;
   irqp-priority = val  KVM_IRQ_PRIORITY_SHIFT;
   irqp-resend = 0;
   irqp-masked_pending = 0;
   irqp-asserted = 0;
 
 Why it is safe to initialize those values to default values during
 migration? Wouldn't it be simpler and more correct to just transfer
 the whole content of irqp from src to dst?
 
 Thus my interface had:
 
 - a KVM_CREATE_IRQCHIP_ARGS ioctl, with an argument structure that
  indicates the overall architecture of the interrupt subsystem,
 
 - KVM_IRQCHIP_SET_SOURCES and KVM_IRQCHIP_GET_SOURCES ioctls, that
  return or modify the state of some set of interrupt sources
 
 - a KVM_REG_PPC_ICP_STATE identifier in the ONE_REG register
  identifier space, that is used to save and restore the per-vcpu
  interrupt presentation state.
 
 Alternatively, I could modify my code to use the existing
 KVM_CREATE_IRQCHIP, KVM_GET_IRQCHIP, and KVM_SET_IRQCHIP ioctls.  If I
 were to do that I would want to rename the 'pad' field in struct
 kvm_irqchip to 'nr' and use it with 'chip_id' to identify a range of
 interrupt sources to be affected.  I'd also want to keep the ONE_REG
 identifier for the per-vcpu state.
 It is preferable to the interface you propose since I do not think your
 interface fits other interrupt controllers well. You can put nr field
 into dummy[] payload, or replace pad with union {pad, nr}.
 
 
 Or I could change over to using Scott's device control API, though I
 have some objections to doing that.
 
 What is your advice about which interface to use?
 
 The ideal situation would be if you and Scott agree on the details about
 the interface. If you don't like something about Scott's interface we
 can discuss it and shape it to something you agree with or even like.
 I already asked Scott to 

Re: [PATCH v5 1/3] VFIO: Wrapper for getting reference to vfio_device from device

2013-02-25 Thread Alex Williamson
On Sat, 2013-02-23 at 23:25 -0600, Vijay Mohan Pandarathil wrote:
   - Added vfio_device_get_from_dev() as wrapper to get
   reference to vfio_device from struct device.
 
   - Added vfio_device_data() as a wrapper to get device_data from
   vfio_device.
 
 Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com
 ---
  drivers/vfio/vfio.c  | 47 ++-
  include/linux/vfio.h |  3 +++
  2 files changed, 37 insertions(+), 13 deletions(-)
 
 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index 12c264d..863d1d3 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -407,12 +407,13 @@ static void vfio_device_release(struct kref *kref)
  }
  
  /* Device reference always implies a group reference */
 -static void vfio_device_put(struct vfio_device *device)
 +void vfio_device_put(struct vfio_device *device)
  {
   struct vfio_group *group = device-group;
   kref_put_mutex(device-kref, vfio_device_release, group-device_lock);
   vfio_group_put(group);
  }
 +EXPORT_SYMBOL_GPL(vfio_device_put);
  
  static void vfio_device_get(struct vfio_device *device)
  {
 @@ -642,8 +643,12 @@ int vfio_add_group_dev(struct device *dev,
  }
  EXPORT_SYMBOL_GPL(vfio_add_group_dev);
  
 -/* Test whether a struct device is present in our tracking */
 -static bool vfio_dev_present(struct device *dev)
 +/**
 + * This does a get on the vfio_device from device.
 + * Callers of this function will have to call vfio_put_device() to
 + * remove the reference.
 + */
 +struct vfio_device *vfio_device_get_from_dev(struct device *dev)
  {

I cc'd you on a patch that changes vfio_dev_present to fix a bug where
the device to iommu group lookup has been shutdown.  Using it for this
purpose probably has the same bug.  That code is in my next branch and
in the most reset pull request to Linus.

I think instead the code becomes much more simple.  We're registering a
callback for a struct device for which vfio-pci is the driver.  During
driver release we disable that callback for the device.  Thus during the
callback, we know the device is owned by vfio-pci, which means that the
drvdata has what we need.  So I think vfio_device_get_from_dev() simply
becomes:

struct vfio_device *vfio_get_device_from_dev(struct device *dev)
{
   struct vfio_device *device = dev_get_drvdata(dev);

   vfio_device_get(device);

   return device;
}
EXPORT_SYMBOL_GPL(vfio_get_device_from_dev);

Thanks,
Alex

   struct iommu_group *iommu_group;
   struct vfio_group *group;
 @@ -651,25 +656,41 @@ static bool vfio_dev_present(struct device *dev)
  
   iommu_group = iommu_group_get(dev);
   if (!iommu_group)
 - return false;
 + return NULL;
  
   group = vfio_group_get_from_iommu(iommu_group);
   if (!group) {
   iommu_group_put(iommu_group);
 - return false;
 + return NULL;
   }
  
   device = vfio_group_get_device(group, dev);
 - if (!device) {
 - vfio_group_put(group);
 - iommu_group_put(iommu_group);
 - return false;
 - }
 -
 - vfio_device_put(device);
   vfio_group_put(group);
   iommu_group_put(iommu_group);
 - return true;
 + return device;
 +}
 +EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
 +
 +/*
 + * Caller must hold a reference to the vfio_device
 + */
 +void *vfio_device_data(struct vfio_device *device)
 +{
 + return device-device_data;
 +}
 +EXPORT_SYMBOL_GPL(vfio_device_data);
 +
 +/* Test whether a struct device is present in our tracking */
 +static bool vfio_dev_present(struct device *dev)
 +{
 + struct vfio_device *device;
 +
 + device = vfio_device_get_from_dev(dev);
 + if (device) {
 + vfio_device_put(device);
 + return true;
 + } else
 + return false;
  }
  
  /*
 diff --git a/include/linux/vfio.h b/include/linux/vfio.h
 index ab9e862..ac8d488 100644
 --- a/include/linux/vfio.h
 +++ b/include/linux/vfio.h
 @@ -45,6 +45,9 @@ extern int vfio_add_group_dev(struct device *dev,
 void *device_data);
  
  extern void *vfio_del_group_dev(struct device *dev);
 +extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
 +extern void vfio_device_put(struct vfio_device *device);
 +extern void *vfio_device_data(struct vfio_device *device);
  
  /**
   * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Avi Kivity

 I see a couple of possible solutions:
 1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons:
 current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it
 will be slow on newer kernels

You could backport the qemu change, verify that it builds, and push it
to stable branches.  It's hardly ideal but if nothing else comes up
then it's a solution.


 2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not
 running during injection. This assumes that if vcpu is running and does
 not process interrupt it is guest fault and the same can happen on real
 HW too. Coalescing when vcpu is not running though is the result of CPU
 overcommit and should be reported. Cons interface definition is kind of
 murky.

You still have a window where the vcpu is scheduled out with guest
interrupts disabled, then scheduled back in and before it manages to
handle the interrupt, the second one hits.

It's worth testing though.

 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI
 notifiers for interrupt reinjection. This requires us to add interface
 for reporting EOI to userspace. This is not in the scope of this
 patchset. Cons: need to introduce new interface (and the one that will
 not work on AMD BTW)

 Other ideas?

1. inject RTC interrupt
2. not see EOI
3. inject RTC interrupt
4. due to 2, report coalesced

AMD can still use IRR test-and-set.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Gleb Natapov
On Mon, Feb 25, 2013 at 11:17:39AM -0300, Marcelo Tosatti wrote:
 On Mon, Feb 25, 2013 at 11:00:21AM -0300, Marcelo Tosatti wrote:
   I see a couple of possible solutions:
   1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons:
   current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it
   will be slow on newer kernels
  
  Can add a capability to QEMU and enable APICv selectively only 
  in newer QEMU, which can issue KVM_IRQ_LINE_STATUS on target vcpu
  only when necessary (and KVM_IRQ_LINE otherwise).
 
 Bad idea. What happens with mixed scenarios.
 
Exactly.

  Even a lock serializing injection is not safe because ON bit is cleared
  before XCHG(PIR, 0). Must do something heavier (such as running on
  target vcpu context).
 
 Note always running on target vcpu is likely to be slower than no-APICv.
 
Yes, but we will do it only for RTC interrupt. Still performance hit
should be very visible when RTC is in 1000Hz mode.

 So need to do something heavier on the kernel under serialization, if 
 firmware cannot be changed (injection from simultaneous CPUs should be
 rare so if data to serialize __accept_apic_irq is cache-line aligned 
 it should reduce performance impact).
 
   2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not
   running during injection. This assumes that if vcpu is running and does
   not process interrupt it is guest fault and the same can happen on real
   HW too. Coalescing when vcpu is not running though is the result of CPU
   overcommit and should be reported. Cons interface definition is kind of
   murky.
   3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI
   notifiers for interrupt reinjection. This requires us to add interface
   for reporting EOI to userspace. This is not in the scope of this
   patchset. Cons: need to introduce new interface (and the one that will
   not work on AMD BTW)
 
 Is there no int-ack notification at RTC HW level?
There is, but Windows calls it twice for each injected interrupt. I
tried to use it in the past to detect interrupt coalescing, but this
double ack prevented me from doing so. May be revisit this approach with
willingness to be more hacky about it. Also it is possible to disable
RTC ack for HyperV guests. We do not do it and if we will use the ack we
will not be able to.

 
  Breaks older userspace?
Older userspace will have timedrif with Windows, yes. Note that we some
version of Windows it has it now too.

   
   Other ideas?
  
  Can HW write a 'finished' bit after 6 in the reserved area? Suppose its
  not a KVM-specific problem?
  
I still think adding locking just to obtain correct injection status is bad
trade-off even if HW would allow us to get away with it.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Gleb Natapov
On Mon, Feb 25, 2013 at 06:50:40PM +0200, Avi Kivity wrote:
 
  I see a couple of possible solutions:
  1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons:
  current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it
  will be slow on newer kernels
 
 You could backport the qemu change, verify that it builds, and push it
 to stable branches.  It's hardly ideal but if nothing else comes up
 then it's a solution.
 
 
  2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not
  running during injection. This assumes that if vcpu is running and does
  not process interrupt it is guest fault and the same can happen on real
  HW too. Coalescing when vcpu is not running though is the result of CPU
  overcommit and should be reported. Cons interface definition is kind of
  murky.
 
 You still have a window where the vcpu is scheduled out with guest
 interrupts disabled, then scheduled back in and before it manages to
 handle the interrupt, the second one hits.
 
Yes, definitely not ideal solution.

 It's worth testing though.
 
Yes again. Windows almost never disables interrupts and RTC interrupt is
of highest priority.

  3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI
  notifiers for interrupt reinjection. This requires us to add interface
  for reporting EOI to userspace. This is not in the scope of this
  patchset. Cons: need to introduce new interface (and the one that will
  not work on AMD BTW)
 
  Other ideas?
 
 1. inject RTC interrupt
 2. not see EOI
 3. inject RTC interrupt
 4. due to 2, report coalesced

That's the 3 in my list, no?
 
 AMD can still use IRR test-and-set.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM ARM compilation fixes for the 3.9 merge window

2013-02-25 Thread Gleb Natapov

Linus,

Please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git master

To receive the following KVM bug fixes. They fix ARM KVM compilation breakage
due to changes from kvm.git.


Marc Zyngier (3):
  ARM: KVM: fix kvm_arch_{prepare,commit}_memory_region
  ARM: KVM: Rename KVM_MEMORY_SLOTS - KVM_USER_MEM_SLOTS
  ARM: KVM: fix compilation after removal of user_alloc from struct 
kvm_memory_slot

 arch/arm/include/asm/kvm_host.h |2 +-
 arch/arm/kvm/arm.c  |4 ++--
 arch/arm/kvm/mmu.c  |5 -
 3 files changed, 3 insertions(+), 8 deletions(-)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH qom-cpu v2 4/7] cpu: Pass CPUState to cpu_interrupt()

2013-02-25 Thread Andreas Färber
Move it to qom/cpu.h to avoid issues with include order.

Change pc_acpi_smi_interrupt() opaque to X86CPU.

Signed-off-by: Andreas Färber afaer...@suse.de
---
 cpus.c   |2 +-
 exec.c   |2 +-
 hw/alpha_typhoon.c   |   10 --
 hw/apic.c|   21 +++--
 hw/arm_pic.c |5 ++---
 hw/cris_pic_cpu.c|3 +--
 hw/leon3.c   |3 ++-
 hw/lm32_boards.c |3 +--
 hw/lpc_ich9.c|2 +-
 hw/microblaze_pic_cpu.c  |3 +--
 hw/milkymist.c   |3 +--
 hw/mips_int.c|2 +-
 hw/omap1.c   |4 ++--
 hw/openrisc_pic.c|2 +-
 hw/pc.c  |6 +++---
 hw/pc_piix.c |3 ++-
 hw/ppc.c |6 +++---
 hw/ppc405_uc.c   |4 ++--
 hw/puv3.c|3 +--
 hw/pxa2xx.c  |7 ---
 hw/pxa2xx_gpio.c |2 +-
 hw/pxa2xx_pic.c  |6 +++---
 hw/sh_intc.c |5 +++--
 hw/sun4m.c   |9 ++---
 hw/sun4u.c   |4 ++--
 hw/xtensa_pic.c  |2 +-
 include/exec/cpu-all.h   |   13 -
 include/qom/cpu.h|   24 
 kvm-all.c|4 +---
 target-arm/helper.c  |2 +-
 target-i386/helper.c |6 +++---
 target-m68k/helper.c |2 +-
 target-mips/op_helper.c  |8 
 target-ppc/excp_helper.c |2 +-
 target-s390x/cpu.h   |6 +++---
 target-s390x/helper.c|4 ++--
 translate-all.c  |   12 +---
 37 Dateien geändert, 106 Zeilen hinzugefügt(+), 99 Zeilen entfernt(-)

diff --git a/cpus.c b/cpus.c
index 8d47bfd..e919dd7 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1309,7 +1309,7 @@ void qmp_inject_nmi(Error **errp)
 
 for (env = first_cpu; env != NULL; env = env-next_cpu) {
 if (!env-apic_state) {
-cpu_interrupt(env, CPU_INTERRUPT_NMI);
+cpu_interrupt(CPU(x86_env_get_cpu(env)), CPU_INTERRUPT_NMI);
 } else {
 apic_deliver_nmi(env-apic_state);
 }
diff --git a/exec.c b/exec.c
index 9d1605f..f686739 100644
--- a/exec.c
+++ b/exec.c
@@ -1467,7 +1467,7 @@ static void check_watchpoint(int offset, int len_mask, 
int flags)
 /* We re-entered the check after replacing the TB. Now raise
  * the debug interrupt so that is will trigger after the
  * current instruction. */
-cpu_interrupt(env, CPU_INTERRUPT_DEBUG);
+cpu_interrupt(ENV_GET_CPU(env), CPU_INTERRUPT_DEBUG);
 return;
 }
 vaddr = (env-mem_io_vaddr  TARGET_PAGE_MASK) + offset;
diff --git a/hw/alpha_typhoon.c b/hw/alpha_typhoon.c
index dfb10c7..3b15b30 100644
--- a/hw/alpha_typhoon.c
+++ b/hw/alpha_typhoon.c
@@ -62,10 +62,9 @@ static void cpu_irq_change(AlphaCPU *cpu, uint64_t req)
 {
 /* If there are any non-masked interrupts, tell the cpu.  */
 if (cpu != NULL) {
-CPUAlphaState *env = cpu-env;
 CPUState *cs = CPU(cpu);
 if (req) {
-cpu_interrupt(env, CPU_INTERRUPT_HARD);
+cpu_interrupt(cs, CPU_INTERRUPT_HARD);
 } else {
 cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
 }
@@ -359,11 +358,10 @@ static void cchip_write(void *opaque, hwaddr addr,
 for (i = 0; i  4; ++i) {
 AlphaCPU *cpu = s-cchip.cpu[i];
 if (cpu != NULL) {
-CPUAlphaState *env = cpu-env;
 CPUState *cs = CPU(cpu);
 /* IPI can be either cleared or set by the write.  */
 if (newval  (1  (i + 8))) {
-cpu_interrupt(env, CPU_INTERRUPT_SMP);
+cpu_interrupt(cs, CPU_INTERRUPT_SMP);
 } else {
 cpu_reset_interrupt(cs, CPU_INTERRUPT_SMP);
 }
@@ -687,7 +685,7 @@ static void typhoon_set_timer_irq(void *opaque, int irq, 
int level)
 /* Set the ITI bit for this cpu.  */
 s-cchip.misc |= 1  (i + 4);
 /* And signal the interrupt.  */
-cpu_interrupt(cpu-env, CPU_INTERRUPT_TIMER);
+cpu_interrupt(CPU(cpu), CPU_INTERRUPT_TIMER);
 }
 }
 }
@@ -700,7 +698,7 @@ static void typhoon_alarm_timer(void *opaque)
 
 /* Set the ITI bit for this cpu.  */
 s-cchip.misc |= 1  (cpu + 4);
-cpu_interrupt(s-cchip.cpu[cpu]-env, CPU_INTERRUPT_TIMER);
+cpu_interrupt(CPU(s-cchip.cpu[cpu]), CPU_INTERRUPT_TIMER);
 }
 
 PCIBus *typhoon_init(ram_addr_t ram_size, ISABus **isa_bus,
diff --git a/hw/apic.c b/hw/apic.c
index e113c11..f7cb8b1 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -151,15 +151,15 @@ static void apic_local_deliver(APICCommonState *s, int 
vector)
 
 switch ((lvt  8)  7) {
 case APIC_DM_SMI:
-cpu_interrupt(s-cpu-env, CPU_INTERRUPT_SMI);
+cpu_interrupt(CPU(s-cpu), 

Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Avi Kivity
On Mon, Feb 25, 2013 at 7:43 PM, Gleb Natapov g...@redhat.com wrote:

  3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI
  notifiers for interrupt reinjection. This requires us to add interface
  for reporting EOI to userspace. This is not in the scope of this
  patchset. Cons: need to introduce new interface (and the one that will
  not work on AMD BTW)
 
  Other ideas?

 1. inject RTC interrupt
 2. not see EOI
 3. inject RTC interrupt
 4. due to 2, report coalesced

 That's the 3 in my list, no?


No, this idea doesn't involve user interface changes.  We still report
through KVM_IRQ_LINE_STATUS or whatever it's called.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Gleb Natapov
On Mon, Feb 25, 2013 at 08:56:07PM +0200, Avi Kivity wrote:
 On Mon, Feb 25, 2013 at 7:43 PM, Gleb Natapov g...@redhat.com wrote:
 
   3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI
   notifiers for interrupt reinjection. This requires us to add interface
   for reporting EOI to userspace. This is not in the scope of this
   patchset. Cons: need to introduce new interface (and the one that will
   not work on AMD BTW)
  
   Other ideas?
 
  1. inject RTC interrupt
  2. not see EOI
  3. inject RTC interrupt
  4. due to 2, report coalesced
 
  That's the 3 in my list, no?
 
 
 No, this idea doesn't involve user interface changes.  We still report
 through KVM_IRQ_LINE_STATUS or whatever it's called.
Interesting idea!

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

2013-02-25 Thread Marcelo Tosatti
On Mon, Feb 25, 2013 at 07:40:07PM +0200, Gleb Natapov wrote:
 On Mon, Feb 25, 2013 at 11:17:39AM -0300, Marcelo Tosatti wrote:
  On Mon, Feb 25, 2013 at 11:00:21AM -0300, Marcelo Tosatti wrote:
I see a couple of possible solutions:
1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons:
current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it
will be slow on newer kernels
   
   Can add a capability to QEMU and enable APICv selectively only 
   in newer QEMU, which can issue KVM_IRQ_LINE_STATUS on target vcpu
   only when necessary (and KVM_IRQ_LINE otherwise).
  
  Bad idea. What happens with mixed scenarios.
  
 Exactly.
 
   Even a lock serializing injection is not safe because ON bit is cleared
   before XCHG(PIR, 0). Must do something heavier (such as running on
   target vcpu context).
  
  Note always running on target vcpu is likely to be slower than no-APICv.
  
 Yes, but we will do it only for RTC interrupt. Still performance hit
 should be very visible when RTC is in 1000Hz mode.

So older qemu-kvm on APICv enabled hardware becomes slow? If that is
acceptable, fine.

  So need to do something heavier on the kernel under serialization, if 
  firmware cannot be changed (injection from simultaneous CPUs should be
  rare so if data to serialize __accept_apic_irq is cache-line aligned 
  it should reduce performance impact).
  
2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not
running during injection. This assumes that if vcpu is running and does
not process interrupt it is guest fault and the same can happen on real
HW too. Coalescing when vcpu is not running though is the result of CPU
overcommit and should be reported. Cons interface definition is kind of
murky.
3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI
notifiers for interrupt reinjection. This requires us to add interface
for reporting EOI to userspace. This is not in the scope of this
patchset. Cons: need to introduce new interface (and the one that will
not work on AMD BTW)
  
  Is there no int-ack notification at RTC HW level?
 There is, but Windows calls it twice for each injected interrupt. I
 tried to use it in the past to detect interrupt coalescing, but this
 double ack prevented me from doing so. May be revisit this approach with
 willingness to be more hacky about it. Also it is possible to disable
 RTC ack for HyperV guests. We do not do it and if we will use the ack we
 will not be able to.

Is it OK to kill the ability to have coalesced information?

   Breaks older userspace?
 Older userspace will have timedrif with Windows, yes. Note that we some
 version of Windows it has it now too.

Other ideas?
   
   Can HW write a 'finished' bit after 6 in the reserved area? Suppose its
   not a KVM-specific problem?
   
 I still think adding locking just to obtain correct injection status is bad
 trade-off even if HW would allow us to get away with it.

Perhaps with notification at end of copy it could be lockless.

With the current scheme, it is not possible to get it right because
the notification bit disappears temporarily from sight. And it is not
possible to distinguish between 'interrupt injected' and 'interrupt
merged', as discussed here. So would have to serialize completly along
the lines of:

Lock and only inject if can identify that interrupt handler is not
running.

If its possible to drop KVM_IRQ_LINE_STATUS, then demand APICv HW to use
recent software, fine (did not grasp Avi's second idea).


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/6] kvm: add device control API

2013-02-25 Thread Scott Wood

 On 02/23/2013 09:04:33 AM, Marcelo Tosatti wrote:

On Thu, Feb 21, 2013 at 08:00:25PM -0600, Scott Wood wrote:
 On 02/21/2013 05:03:32 PM, Marcelo Tosatti wrote:
 You are not writing to the registers from the CPU point of view.

 That's exactly how KVM_DEV_MPIC_GRP_REGISTER is defined and
 implemented on MPIC (with the exception of registers whose behavior
 changes based on which specific vcpu you use to access them).
 If/when we have a need to set/get state in a different manner,
 that's a separate attribute group.

Can you describe usage of this register again?


It's used by QEMU to reflect memory space accesses to the kernel.  In  
practice, these are either debug accesses or MSIs.



 Also, it is necessary to provide proper locking of device attribute
 write versus vcpu device access. So far we have been focusing on
 having
 a lockless vcpu path.

 How is device access related to vcpus?  Existing irqchip code is not
 lockless.

VCPUS access in-kernel devices.


Right, VCPUs (plural) not VCPU (singular).  Thus locking is needed.


Yes, it is lockless (see RCU usage in virt/kvm/).


virt/kvm/kvm_main.c: /* kvm_io_bus_write - called under kvm-slots_lock  
*/

virt/kvm/ioapic.c: spin_lock(ioapic-lock);
etc.


 Basically abstract 'device attributes' are too abstract.

 It's up to the device-specific documentation to make them not
 abstract (I admit there are a few details missing in mpic.txt that
 I've pointed out in this thread -- it is RFC v1 after all).  This
 wouldn't be any different if we used separate ioctls for everything.
 It's like saying abstract 'ioctl' is too abstract.

Perhaps a better way to put it is that its too permissive.


It's no more permissive than saying go define an ioctl.  It is  
somewhat more constrained than that, in that it is an operation on a  
particular device, as opposed to any possible KVM action.



 However, your proposed interface deals with sucky capability,
 versioning
 and namespace conflicts we have now. Note these items can probably  
be

 improved separately.

 Any particular proposals?

Namespace conflicts: Reserve ranges for each arch.


That only helps when the conflict is between different arches (as  
opposed to different devices), and it discourages sharing between  
arches.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/6] kvm: add device control API

2013-02-25 Thread Scott Wood

On 02/24/2013 09:46:17 AM, Gleb Natapov wrote:

On Thu, Feb 21, 2013 at 08:17:54PM -0600, Scott Wood wrote:
  On 02/21/2013 02:22:09 AM, Gleb Natapov wrote:
 On Wed, Feb 20, 2013 at 08:05:12PM -0600, Scott Wood wrote:
  You mean like what we did with SREGS, that got deprecated and
  replaced with ONE_REG?
 
 SREGS is implemented by ppc. I see ONE_REG as addition to REGS
 interface. You can access all register at once or you can access  
them
 one by one. If there is a need we can add MULTIPLE_REGS that will  
get

 list of requested REGS.

 http://www.spinics.net/lists/kvm-ppc/msg04876.html
 http://www.spinics.net/lists/kvm-ppc/msg05842.html

If Alex prefers ONE_REG interface this is his call :)


My point is the reasons we prefer it, that SREGS really is deprecated  
for

new state on PPC (REGS is not extensible so it's been in that state for
even longer -- hence SREGS), and the desire to not create that same mess
again.  Running out of space is just one of the ways in which SREGS is a
mess, and I think trying to represent MPIC state that way would be even
worse.


 The interface is not over generic i.e it does
 not try to replace KVM_RUN by writing special register.

 Sigh.

 And we
  don't necessarily want to set *everything*.
 What are those cases? You do need to on reset/migration.

 Why do we want to set all the registers on reset, rather than tell
 the in-kernel device to reset?  The default state came from the
 kernel in the first place on irqchip creation...

I have nothing against telling in-kernel device to reset provided  
there

is a way to do so, which current interface lacks.


If by current you mean what is proposed here, you can use
KVM_MPIC_DEV_MPIC_REGISTER to set GCR[RESET].

I can see the appeal of an explicit reset device command, though,
especially if the hardware reset bit is defined such that you're  
supposed

to poll until it's cleared to know that the reest is complete (the KVM
implementation is not likely to return before the reset is complete, but
it would be better to not rely on such knowledge).


Reset in userspase has
its advantage too: bugs are easier to fix,


They're harder to fix if fixing it requires adding something to the API.
Bugs in reset are just a small portion of overall bugs, so I'm not sure
how important this is.  It would require that we implement additional
state accessors now, rather than if and when we do migration.  It could
cause the total bug count to go up versus code that already works and is
self-contained.  :-P


there may be different kind of resets (hard/soft).


This can be communicated through a device-specific command if it is
applicable (it isn't for MPIC).


 Again, we do not currently support migration on MPIC.  It is a very
 low priority for embedded.  We do not wish to rule it out entirely,
 but it most likely would require adding more state accesors.
The interface suppose to be generic,
we are not talking about MPIC
specifically here.


There are two separate things here.  There is the device control api,
which is generic, but is just a way to create devices and issue commands
to them.

Then there is the specific device implementation, which is not generic.
The details of how migration is handled for a particular device belong  
to

the specific device implementation.


Regarding migration never say never  :)


I did say we don't want to rule it out -- I'm just emphasising this so
that we don't get bogged down in questions of how the currently defined
MPIC attributes/commands would get used for migration.  If we tried to
blindly add support for that now, without being able to test, or being
able to refer to what QEMU normally saves/restores for MPIC, we'd
probably get it wrong.  Just as the untested savevm code in QEMU's
hw/openpic.c is wrong/incomplete.


 As for other types of devices, x86 has i8254.c emulated in-kernel --
 I know that's not going to switch to the new interface, but it could
 have if it existed back then.
Since it is not going to switch it is not a good example.


The point of the example is to show that irqchips aren't the only thing
that could ever make sense to emulate in the kernel.  I can't see into
the future and tell you what the actual first non-irqchip user would be,
or if there will definitely be one.

If we wait until that user comes along, we'd end up saying *that* is
the only user and since MPIC is not going to switch it is not a good
example.  If no non-irqchip user ends up coming along, what is the
actual harm from not putting the words irqchip in the interface
description (the generic part, not the device-specific part)?


I can also see creating an in-kernel
 emulation device for doing MMIO filtering on some piece of embedded
 hardware that guests need to access with reasonable performance, but
 the hardware desginers screwed up the protection slightly (e.g. put
 other things in the same 4K page).  We've done such 

Re: [PATCH] x86: kvmclock: Do not setup kvmclock vsyscall in the absence of that clock

2013-02-25 Thread Marcelo Tosatti
On Sat, Feb 23, 2013 at 05:05:29PM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 This fixes boot lockups with no-kvmclock, when the host is not
 exposing this particular feature (QEMU: -cpu ...,-kvmclock) or when
 the kvmclock initialization failed for whatever reason.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
 
 Should go to 3.8 as well, I presume.
 
  arch/x86/kernel/kvmclock.c |9 -
  1 files changed, 8 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
 index 5bedbdd..b730efa 100644
 --- a/arch/x86/kernel/kvmclock.c
 +++ b/arch/x86/kernel/kvmclock.c
 @@ -160,8 +160,12 @@ int kvm_register_clock(char *txt)
  {
   int cpu = smp_processor_id();
   int low, high, ret;
 - struct pvclock_vcpu_time_info *src = hv_clock[cpu].pvti;
 + struct pvclock_vcpu_time_info *src;
 +
 + if (!hv_clock)
 + return 0;
  
 + src = hv_clock[cpu].pvti;
   low = (int)__pa(src) | 1;
   high = ((u64)__pa(src)  32);
   ret = native_write_msr_safe(msr_kvm_system_time, low, high);
 @@ -276,6 +280,9 @@ int __init kvm_setup_vsyscall_timeinfo(void)
   struct pvclock_vcpu_time_info *vcpu_time;
   unsigned int size;
  
 + if (!hv_clock)
 + return 0;
 +
   size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
  
   preempt_disable();
 -- 
 1.7.3.4
 

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in kvm on s390

2013-02-25 Thread kvm
The Buildbot has detected a new failure on builder s390 while building kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/s390/builds/822

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_master' triggered this build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot



Re: [RFC PATCH 1/6] kvm: add device control API

2013-02-25 Thread Gleb Natapov
On Mon, Feb 25, 2013 at 12:11:19PM +1100, Paul Mackerras wrote:
 On Mon, Feb 18, 2013 at 02:21:59PM +0200, Gleb Natapov wrote:
  Copying Christoffer since ARM has in kernel irq chip too.
  
  On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote:
   Currently, devices that are emulated inside KVM are configured in a
   hardcoded manner based on an assumption that any given architecture
   only has one way to do it.  If there's any need to access device state,
   it is done through inflexible one-purpose-only IOCTLs (e.g.
   KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
   cumbersome and depletes a limited numberspace.
   
   This API provides a mechanism to instantiate a device of a certain
   type, returning an ID that can be used to set/get attributes of the
   device.  Attributes may include configuration parameters (e.g.
   register base address), device state, operational commands, etc.  It
   is similar to the ONE_REG API, except that it acts on devices rather
   than vcpus.
  You are not only provide different way to create in kernel irq chip you
  also use an alternate way to trigger interrupt lines. Before going into
  interface specifics lets think about whether it is really worth it? x86
  obviously support old way and will have to for some, very long, time.
  ARM vGIC code, that is ready to go upstream, uses old way too. So it will
  be 2 archs against one. Christoffer do you think the proposed way it
  better for your needs. Are you willing to make vGIC use it?
 
 In fact there have been two distinct interrupt controller emulations
 for PPC posted recently, Scott's and mine, with quite different
 interfaces.  In contrast to Scott's device control API, where the
 operations you would do for different interrupt controllers are quite
 different, mine attempted to define a much more abstract interface for
 interrupt controller emulations, with the idea that an interrupt
 controller subsystem connects a set of interrupt sources, each with
 some state, to a set of interrupt delivery mechanisms, one per vcpu,
 also with some state.
 
I agree with Scott that it is futile to try and come up with generic
irqchip configuration interface and I doubt it is needed from QEMU
or other userspace pov. I looked at proposed KVM_IRQCHIP_SET_SOURCES
interface and while it is possible to pass some information about
pic/ioapic using it there will be a lot of information that will not
fit there. For one there is global irqchips related state and proposed
interface only talk about interrupt sources. Another is that using
generic interface will require us to have a code that translate irqchip
representation into this generic one and back for no apparent gain.
Currently pic/ioapic state is very similar to what HW specification
defines and it is not going to change.

Looking at your implementation of KVM_IRQCHIP_SET_SOURCES I wounder how
well it work for migration though. The interface suppose to transfer
irqchips state as is, but I see things like that in your code:

   mutex_lock(ics-lock);
   irqp-server = val  KVM_IRQ_SERVER_MASK;
   irqp-priority = val  KVM_IRQ_PRIORITY_SHIFT;
   irqp-resend = 0;
   irqp-masked_pending = 0;
   irqp-asserted = 0;

Why it is safe to initialize those values to default values during
migration? Wouldn't it be simpler and more correct to just transfer
the whole content of irqp from src to dst?

 Thus my interface had:
 
 - a KVM_CREATE_IRQCHIP_ARGS ioctl, with an argument structure that
   indicates the overall architecture of the interrupt subsystem,
 
 - KVM_IRQCHIP_SET_SOURCES and KVM_IRQCHIP_GET_SOURCES ioctls, that
   return or modify the state of some set of interrupt sources
 
 - a KVM_REG_PPC_ICP_STATE identifier in the ONE_REG register
   identifier space, that is used to save and restore the per-vcpu
   interrupt presentation state.
 
 Alternatively, I could modify my code to use the existing
 KVM_CREATE_IRQCHIP, KVM_GET_IRQCHIP, and KVM_SET_IRQCHIP ioctls.  If I
 were to do that I would want to rename the 'pad' field in struct
 kvm_irqchip to 'nr' and use it with 'chip_id' to identify a range of
 interrupt sources to be affected.  I'd also want to keep the ONE_REG
 identifier for the per-vcpu state.
It is preferable to the interface you propose since I do not think your
interface fits other interrupt controllers well. You can put nr field
into dummy[] payload, or replace pad with union {pad, nr}.

 
 Or I could change over to using Scott's device control API, though I
 have some objections to doing that.
 
 What is your advice about which interface to use?
 
The ideal situation would be if you and Scott agree on the details about
the interface. If you don't like something about Scott's interface we
can discuss it and shape it to something you agree with or even like.
I already asked Scott to introduce command interface. Scott does not

Re: [RFC PATCH 1/6] kvm: add device control API

2013-02-25 Thread Alexander Graf

On 24.02.2013, at 16:46, Gleb Natapov wrote:

 On Thu, Feb 21, 2013 at 08:17:54PM -0600, Scott Wood wrote:
 On 02/21/2013 02:22:09 AM, Gleb Natapov wrote:
 On Wed, Feb 20, 2013 at 08:05:12PM -0600, Scott Wood wrote:
 On 02/20/2013 07:09:49 AM, Gleb Natapov wrote:
 On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote:
 On 02/19/2013 06:24:18 AM, Gleb Natapov wrote:
 On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote:
 The ability to set/get attributes is needed.  Sorry, but get
 or set
 one blob of data, up to 512 bytes, for the entire irqchip is
 just
 not good enough -- assuming you don't want us to start
 sticking
 pointers and commands in *that* data. :-)
 
 Proposed interface sticks pointers into ioctl data, so why doing
 the same
 for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile.
 
 There's a difference between putting a pointer in an ioctl
 control
 structure that is specifically documented as being that way
 (as in
 ONE_REG), versus taking an ioctl that claims to be
 setting/getting a
 blob of state and embedding pointers in it.  It would be like
 sticking a pointer in the attribute payload of this API, which I
 think is something to be discouraged.
 If documentation is what differentiate for you between silly
 and smart
 then write documentation instead of new interfaces.
 
 You mean like what we did with SREGS, that got deprecated and
 replaced with ONE_REG?
 
 SREGS is implemented by ppc. I see ONE_REG as addition to REGS
 interface. You can access all register at once or you can access them
 one by one. If there is a need we can add MULTIPLE_REGS that will get
 list of requested REGS.
 
 http://www.spinics.net/lists/kvm-ppc/msg04876.html
 http://www.spinics.net/lists/kvm-ppc/msg05842.html
 
 If Alex prefers ONE_REG interface this is his call :)

There's a reason I prefer it :). We ran out of space in the SREGS call, we had 
a bit map with multiple chunks of data you could set in there and worst of all, 
when we ran out of space we didn't realize it until the code was already in 
Linus' tree!

The normal here's a chunk of data ioctl style works great when you know all 
or most of the details of what you want to transfer. In the case of kvm, we 
need to expose an interface we

  a) Don't control itself. Hardware guys make hardware. We just follow.
  b) Is pretty complex. There can be a lot of state in a CPU / device.

So because if that we needed something more flexible and dynamic. Something 
where we could add registers we forgot without jumping through lots of hoops.

That's why I prefer ONE_REG for vcpus. Since I see the same danger on device 
emulation, I tend to prefer it here as well.

 
 The interface is not over generic i.e it does
 not try to replace KVM_RUN by writing special register.
 
 Sigh.
 
   And we
 don't necessarily want to set *everything*.
 What are those cases? You do need to on reset/migration.
 
 Why do we want to set all the registers on reset, rather than tell
 the in-kernel device to reset?  The default state came from the
 kernel in the first place on irqchip creation...
 
 I have nothing against telling in-kernel device to reset provided there
 is a way to do so, which current interface lacks. Reset in userspase has
 its advantage too: bugs are easier to fix, there may be different kind
 of resets (hard/soft).

The same argument holds for state access. If you suddenly realize oh, I need 
to also set this one hidden state bit in my hardware you only want to add that 
specific write on reset.

If you suddenly find a race condition in migration that happens very rarely, 
you want to add the one register that you forgot to synchronize in addition in 
your new code. You don't want to change the whole ioctl passed struct.

That's why we have the padding in our ioctl structs. And feature bitmaps. And 
once we exceed them, we're screwed.

I want to make sure we don't hit that point.

Actually, there's a really easy way to understand what this interface does. It 
basically implements a state get/set ioctl which allows for very fine grained 
control over your ioctl struct. It does that by only passing one register at a 
time.

If we run into concurrency issues (multiple registers need to be set in a 
locked manner at the same time), we can always add a new ioctl that sets 
multiple attrs/regs at once. For CPUs, we haven't hit that case yet.

 
   It'd also be using
 KVM_SET_IRQCHIP to read data, which is the sort of thing you
 object
 to later on regarding KVM_IRQ_LINE_STATUS.
 
 Do not see why.
 
 It's either that, or have the data direction of the chip field in
 KVM_GET_IRQCHIP not be entirely in the get direction.
 
 Still do not follow. Example?
 
 struct kvm_irqchip has chip_id, pad, and chip.  pad is
 insufficient to communicate attribute type plus a pointer.  So if we
 want to provide a pointer for the kernel to write the attribute
 into, it has to read from memory that 

Re: [RFC PATCH 1/6] kvm: add device control API

2013-02-25 Thread Alexander Graf

On 25.02.2013, at 14:09, Gleb Natapov wrote:

 On Mon, Feb 25, 2013 at 12:11:19PM +1100, Paul Mackerras wrote:
 On Mon, Feb 18, 2013 at 02:21:59PM +0200, Gleb Natapov wrote:
 Copying Christoffer since ARM has in kernel irq chip too.
 
 On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote:
 Currently, devices that are emulated inside KVM are configured in a
 hardcoded manner based on an assumption that any given architecture
 only has one way to do it.  If there's any need to access device state,
 it is done through inflexible one-purpose-only IOCTLs (e.g.
 KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
 cumbersome and depletes a limited numberspace.
 
 This API provides a mechanism to instantiate a device of a certain
 type, returning an ID that can be used to set/get attributes of the
 device.  Attributes may include configuration parameters (e.g.
 register base address), device state, operational commands, etc.  It
 is similar to the ONE_REG API, except that it acts on devices rather
 than vcpus.
 You are not only provide different way to create in kernel irq chip you
 also use an alternate way to trigger interrupt lines. Before going into
 interface specifics lets think about whether it is really worth it? x86
 obviously support old way and will have to for some, very long, time.
 ARM vGIC code, that is ready to go upstream, uses old way too. So it will
 be 2 archs against one. Christoffer do you think the proposed way it
 better for your needs. Are you willing to make vGIC use it?
 
 In fact there have been two distinct interrupt controller emulations
 for PPC posted recently, Scott's and mine, with quite different
 interfaces.  In contrast to Scott's device control API, where the
 operations you would do for different interrupt controllers are quite
 different, mine attempted to define a much more abstract interface for
 interrupt controller emulations, with the idea that an interrupt
 controller subsystem connects a set of interrupt sources, each with
 some state, to a set of interrupt delivery mechanisms, one per vcpu,
 also with some state.
 
 I agree with Scott that it is futile to try and come up with generic
 irqchip configuration interface and I doubt it is needed from QEMU
 or other userspace pov. I looked at proposed KVM_IRQCHIP_SET_SOURCES
 interface and while it is possible to pass some information about
 pic/ioapic using it there will be a lot of information that will not
 fit there. For one there is global irqchips related state and proposed
 interface only talk about interrupt sources. Another is that using
 generic interface will require us to have a code that translate irqchip
 representation into this generic one and back for no apparent gain.
 Currently pic/ioapic state is very similar to what HW specification
 defines and it is not going to change.
 
 Looking at your implementation of KVM_IRQCHIP_SET_SOURCES I wounder how
 well it work for migration though. The interface suppose to transfer
 irqchips state as is, but I see things like that in your code:
 
   mutex_lock(ics-lock);
   irqp-server = val  KVM_IRQ_SERVER_MASK;
   irqp-priority = val  KVM_IRQ_PRIORITY_SHIFT;
   irqp-resend = 0;
   irqp-masked_pending = 0;
   irqp-asserted = 0;
 
 Why it is safe to initialize those values to default values during
 migration? Wouldn't it be simpler and more correct to just transfer
 the whole content of irqp from src to dst?
 
 Thus my interface had:
 
 - a KVM_CREATE_IRQCHIP_ARGS ioctl, with an argument structure that
  indicates the overall architecture of the interrupt subsystem,
 
 - KVM_IRQCHIP_SET_SOURCES and KVM_IRQCHIP_GET_SOURCES ioctls, that
  return or modify the state of some set of interrupt sources
 
 - a KVM_REG_PPC_ICP_STATE identifier in the ONE_REG register
  identifier space, that is used to save and restore the per-vcpu
  interrupt presentation state.
 
 Alternatively, I could modify my code to use the existing
 KVM_CREATE_IRQCHIP, KVM_GET_IRQCHIP, and KVM_SET_IRQCHIP ioctls.  If I
 were to do that I would want to rename the 'pad' field in struct
 kvm_irqchip to 'nr' and use it with 'chip_id' to identify a range of
 interrupt sources to be affected.  I'd also want to keep the ONE_REG
 identifier for the per-vcpu state.
 It is preferable to the interface you propose since I do not think your
 interface fits other interrupt controllers well. You can put nr field
 into dummy[] payload, or replace pad with union {pad, nr}.
 
 
 Or I could change over to using Scott's device control API, though I
 have some objections to doing that.
 
 What is your advice about which interface to use?
 
 The ideal situation would be if you and Scott agree on the details about
 the interface. If you don't like something about Scott's interface we
 can discuss it and shape it to something you agree with or even like.
 I already asked Scott to 

Re: [RFC PATCH 1/6] kvm: add device control API

2013-02-25 Thread Scott Wood

 On 02/23/2013 09:04:33 AM, Marcelo Tosatti wrote:

On Thu, Feb 21, 2013 at 08:00:25PM -0600, Scott Wood wrote:
 On 02/21/2013 05:03:32 PM, Marcelo Tosatti wrote:
 You are not writing to the registers from the CPU point of view.

 That's exactly how KVM_DEV_MPIC_GRP_REGISTER is defined and
 implemented on MPIC (with the exception of registers whose behavior
 changes based on which specific vcpu you use to access them).
 If/when we have a need to set/get state in a different manner,
 that's a separate attribute group.

Can you describe usage of this register again?


It's used by QEMU to reflect memory space accesses to the kernel.  In  
practice, these are either debug accesses or MSIs.



 Also, it is necessary to provide proper locking of device attribute
 write versus vcpu device access. So far we have been focusing on
 having
 a lockless vcpu path.

 How is device access related to vcpus?  Existing irqchip code is not
 lockless.

VCPUS access in-kernel devices.


Right, VCPUs (plural) not VCPU (singular).  Thus locking is needed.


Yes, it is lockless (see RCU usage in virt/kvm/).


virt/kvm/kvm_main.c: /* kvm_io_bus_write - called under kvm-slots_lock  
*/

virt/kvm/ioapic.c: spin_lock(ioapic-lock);
etc.


 Basically abstract 'device attributes' are too abstract.

 It's up to the device-specific documentation to make them not
 abstract (I admit there are a few details missing in mpic.txt that
 I've pointed out in this thread -- it is RFC v1 after all).  This
 wouldn't be any different if we used separate ioctls for everything.
 It's like saying abstract 'ioctl' is too abstract.

Perhaps a better way to put it is that its too permissive.


It's no more permissive than saying go define an ioctl.  It is  
somewhat more constrained than that, in that it is an operation on a  
particular device, as opposed to any possible KVM action.



 However, your proposed interface deals with sucky capability,
 versioning
 and namespace conflicts we have now. Note these items can probably  
be

 improved separately.

 Any particular proposals?

Namespace conflicts: Reserve ranges for each arch.


That only helps when the conflict is between different arches (as  
opposed to different devices), and it discourages sharing between  
arches.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html