Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Jan Kiszka
On 2012-07-20 21:14, Peter Maydell wrote:
 kvm_allows_irq0_override() is a totally x86 specific concept:
 move it to the target-specific source file where it belongs.
 This means we need a new header file for the prototype:
 kvm_i386.h, in line with the existing kvm_ppc.h.

First of all, the patch is inconsistent as it lacks something like
target-i386/kvm-stub.c (IOW, you forgot about kvm_allows_irq0_override
in kvm-stub.c). But the approach is wrong in general, see below.

 
 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
 I'm sure this isn't the only x86ism in the KVM generic source
 files. However the thing I'm specifically trying to do is
 nuke all the uses of kvm_irqchip_in_kernel() in common code,

No, irqchip in kernel is supposed to be a generic concept. We will
also have it on Power. Not sure what your plans are for ARM, maybe it
will always be true there.

That said, maybe there is room for discussion about what it means for
the general KVM code and its users if the irqchip is in the kernel. Two
things that should be common for every arch:
 - VCPU idle management is done inside the kernel
 - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly

The latter point implies that irqfd is available and that interrupt
routes from virtual IRQs (*) (like the one associated with an irqfd) to
the in-kernel IRQ controller have to be established. That's pretty generic.

 as part of trying to untangle what irqchip model do you want
 from other aspects of the interrupt handling model.
 
 Other than this one, there are uses in cpus.c and hw/virtio-pci.c,
 but I haven't figured out yet exactly what those bits of code
 are trying to do...

See above for the two reasons.

Jan

(*) GSI actually means physical or virtual IRQ line.



signature.asc
Description: OpenPGP digital signature


[RFC-v4 0/3] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-21 Thread Nicholas A. Bellinger
From: Nicholas Bellinger n...@linux-iscsi.org

Hi MST, Greg-KH  Co,

The following is -v4 of the in-flight TCM vhost fabric driver for-3.6 code.
This series has been rebased into target-pending.git/for-next-merge this
evening, and the changelog over the last days from v3 - v4 has been:

*) Rename vhost_vring_target - vhost_scsi_target (mst + nab)
*) Use TRANSPORT_IQN_LEN in vhost_scsi_target-vhost_wwpn[] def (nab)
*) Move back to drivers/vhost/, and just use drivers/vhost/Kconfig.tcm (mst)
*) Move TCM_VHOST related ioctl defines from include/linux/vhost.h -
   drivers/vhost/tcm_vhost.h as requested by MST (nab)
*) Move Kbuild.tcm include back from drivers/staging - drivers/vhost/, and
   just use 'if STAGING' around 'source drivers/vhost/Kbuild.tcm'

This series uses Greg-KH's last recommendation from the linux-next thread
to just KISS + use the STAGING bit for this driver short of moving wholesale
into drivers/staging/tcm_vhost/.

As mentioned in the same thread, MST would like to see this in staging so we
don't have to commit to a ABI for QEMU userspace just yet.  Can we agree that
the STAGING bit usage here is enough to mark this code as staging for users..? 
Pretty please..?

The reason that -v4 currently avoids a drivers/staging/tcm_vhost/ move is to
prevent the staging tree needing to merge vhost + target-pending/for-next for
tcm_vhost build requirements.  It is easier to merge via target-pending w/ the
necessary ACKs for the drivers/vhost/ bits, but I'm fine with posting a -v5
series ASAP to move this code into drivers/staging/tcm_vhost + let staging
handle the necessary vhost + target merge dependencies.

I'll let the staging folks (Greg-KH..?) decided if they are OK with the
extra tree merges here before nominating them.  ;)

So aside from the 'marked as staging' directory location back - forth,
please consider ACK'ing the main bits for an initial merge.

Thank you!

Nicholas Bellinger (1):
  tcm_vhost: Initial merge for vhost level target fabric driver

Stefan Hajnoczi (2):
  vhost: Separate vhost-net features from vhost features
  vhost: make vhost work queue visible

 drivers/vhost/Kconfig |3 +
 drivers/vhost/Kconfig.tcm |6 +
 drivers/vhost/Makefile|2 +
 drivers/vhost/net.c   |4 +-
 drivers/vhost/tcm_vhost.c | 1611 +
 drivers/vhost/tcm_vhost.h |   90 +++
 drivers/vhost/test.c  |4 +-
 drivers/vhost/vhost.c |5 +-
 drivers/vhost/vhost.h |6 +-
 9 files changed, 1723 insertions(+), 8 deletions(-)
 create mode 100644 drivers/vhost/Kconfig.tcm
 create mode 100644 drivers/vhost/tcm_vhost.c
 create mode 100644 drivers/vhost/tcm_vhost.h

-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC-v4 1/3] vhost: Separate vhost-net features from vhost features

2012-07-21 Thread Nicholas A. Bellinger
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

In order for other vhost devices to use the VHOST_FEATURES bits the
vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES
constant.

(Asias: Update drivers/vhost/test.c to use VHOST_NET_FEATURES)

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Cc: Zhi Yong Wu wu...@cn.ibm.com
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Asias He as...@redhat.com
Signed-off-by: Nicholas A. Bellinger n...@risingtidesystems.com
---
 drivers/vhost/net.c   |4 ++--
 drivers/vhost/test.c  |4 ++--
 drivers/vhost/vhost.h |3 ++-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f82a739..072cbba 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -823,14 +823,14 @@ static long vhost_net_ioctl(struct file *f, unsigned int 
ioctl,
return -EFAULT;
return vhost_net_set_backend(n, backend.index, backend.fd);
case VHOST_GET_FEATURES:
-   features = VHOST_FEATURES;
+   features = VHOST_NET_FEATURES;
if (copy_to_user(featurep, features, sizeof features))
return -EFAULT;
return 0;
case VHOST_SET_FEATURES:
if (copy_from_user(features, featurep, sizeof features))
return -EFAULT;
-   if (features  ~VHOST_FEATURES)
+   if (features  ~VHOST_NET_FEATURES)
return -EOPNOTSUPP;
return vhost_net_set_features(n, features);
case VHOST_RESET_OWNER:
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 3de00d9..91d6f06 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -261,14 +261,14 @@ static long vhost_test_ioctl(struct file *f, unsigned int 
ioctl,
return -EFAULT;
return vhost_test_run(n, test);
case VHOST_GET_FEATURES:
-   features = VHOST_FEATURES;
+   features = VHOST_NET_FEATURES;
if (copy_to_user(featurep, features, sizeof features))
return -EFAULT;
return 0;
case VHOST_SET_FEATURES:
if (copy_from_user(features, featurep, sizeof features))
return -EFAULT;
-   if (features  ~VHOST_FEATURES)
+   if (features  ~VHOST_NET_FEATURES)
return -EOPNOTSUPP;
return vhost_test_set_features(n, features);
case VHOST_RESET_OWNER:
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8de1fd5..07b9763 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -201,7 +201,8 @@ enum {
VHOST_FEATURES = (1ULL  VIRTIO_F_NOTIFY_ON_EMPTY) |
 (1ULL  VIRTIO_RING_F_INDIRECT_DESC) |
 (1ULL  VIRTIO_RING_F_EVENT_IDX) |
-(1ULL  VHOST_F_LOG_ALL) |
+(1ULL  VHOST_F_LOG_ALL),
+   VHOST_NET_FEATURES = VHOST_FEATURES |
 (1ULL  VHOST_NET_F_VIRTIO_NET_HDR) |
 (1ULL  VIRTIO_NET_F_MRG_RXBUF),
 };
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC-v4 2/3] vhost: make vhost work queue visible

2012-07-21 Thread Nicholas A. Bellinger
From: Stefan Hajnoczi stefa...@gmail.com

The vhost work queue allows processing to be done in vhost worker thread
context, which uses the owner process mm.  Access to the vring and guest
memory is typically only possible from vhost worker context so it is
useful to allow work to be queued directly by users.

Currently vhost_net only uses the poll wrappers which do not expose the
work queue functions.  However, for tcm_vhost (vhost_scsi) it will be
necessary to queue custom work.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Cc: Zhi Yong Wu wu...@cn.ibm.com
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
---
 drivers/vhost/vhost.c |5 ++---
 drivers/vhost/vhost.h |3 +++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 94dbd25..1aab08b 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -64,7 +64,7 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned 
mode, int sync,
return 0;
 }
 
-static void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
+void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 {
INIT_LIST_HEAD(work-node);
work-fn = fn;
@@ -137,8 +137,7 @@ void vhost_poll_flush(struct vhost_poll *poll)
vhost_work_flush(poll-dev, poll-work);
 }
 
-static inline void vhost_work_queue(struct vhost_dev *dev,
-   struct vhost_work *work)
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
unsigned long flags;
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 07b9763..1125af3 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -43,6 +43,9 @@ struct vhost_poll {
struct vhost_dev *dev;
 };
 
+void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
+
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 unsigned long mask, struct vhost_dev *dev);
 void vhost_poll_start(struct vhost_poll *poll, struct file *file);
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC-v4 3/3] tcm_vhost: Initial merge for vhost level target fabric driver

2012-07-21 Thread Nicholas A. Bellinger
From: Nicholas Bellinger n...@linux-iscsi.org

This patch adds the initial code for tcm_vhost, a Vhost level TCM
fabric driver for virtio SCSI initiators into KVM guest.

This code is currently up and running on v3.5-rc2 host+guest along
with the virtio-scsi vdev-scan() patch to allow a proper
scsi_scan_host() to occur once the tcm_vhost nexus has been established
by the paravirtualized virtio-scsi client here:

virtio-scsi: Add vdrv-scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
http://marc.info/?l=linux-scsim=134160609212542w=2

Using tcm_vhost requires Zhi's - Stefan's qemu vhost-scsi tree here:

https://github.com/wuzhy/qemu/tree/vhost-scsi

along with the recent QEMU patch to hw/virtio-scsi.c to set max_target=0
during vhost-scsi operation.

Changelog v3 - v4:

  Rename vhost_vring_target - vhost_scsi_target (mst + nab)
  Use TRANSPORT_IQN_LEN in vhost_scsi_target-vhost_wwpn[] def (nab)
  Move back to drivers/vhost/, and just use drivers/vhost/Kconfig.tcm (mst)
  Move TCM_VHOST related ioctl defines from include/linux/vhost.h -
  drivers/vhost/tcm_vhost.h as requested by MST (nab)
  Move Kbuild.tcm include from drivers/staging - drivers/vhost/, and
  just use 'if STAGING' around 'source drivers/vhost/Kbuild.tcm'

Changelog v2 - v3:

  Unlock on error in tcm_vhost_drop_nexus() (DanC)
  Fix strlen() doesn't count the terminator (DanC)
  Call kfree() on an error path (DanC)
  Convert tcm_vhost_write_pending to use target_execute_cmd (hch + nab)
  Fix another strlen() off by one in tcm_vhost_make_tport (DanC)
  Add option under drivers/staging/Kconfig, and move to drivers/vhost/tcm/
  as requested by MST (nab)

Changelog v1 - v2:

  Fix tv_cmd completion - release SGL memory leak (nab)
  Fix sparse warnings for static variable usage ((Fengguang Wu)
  Fix sparse warnings for min() typing + printk format specs (Fengguang Wu)
  Convert to cmwq submission for I/O dispatch (nab + hch)

Changelog v0 - v1:

  Merge into single source + header file, and move to drivers/vhost/

Cc: Michael S. Tsirkin m...@redhat.com
Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Cc: Zhi Yong Wu wu...@cn.ibm.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Christoph Hellwig h...@lst.de
Cc: Hannes Reinecke h...@suse.de
Cc: Jens Axboe ax...@kernel.dk
Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
---
 drivers/vhost/Kconfig |3 +
 drivers/vhost/Kconfig.tcm |6 +
 drivers/vhost/Makefile|2 +
 drivers/vhost/tcm_vhost.c | 1611 +
 drivers/vhost/tcm_vhost.h |   90 +++
 5 files changed, 1712 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vhost/Kconfig.tcm
 create mode 100644 drivers/vhost/tcm_vhost.c
 create mode 100644 drivers/vhost/tcm_vhost.h

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index e4e2fd1..202bba6 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -9,3 +9,6 @@ config VHOST_NET
  To compile this driver as a module, choose M here: the module will
  be called vhost_net.
 
+if STAGING
+source drivers/vhost/Kconfig.tcm
+endif
diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
new file mode 100644
index 000..a9c6f76
--- /dev/null
+++ b/drivers/vhost/Kconfig.tcm
@@ -0,0 +1,6 @@
+config TCM_VHOST
+   tristate TCM_VHOST fabric module (EXPERIMENTAL)
+   depends on TARGET_CORE  EVENTFD  EXPERIMENTAL  m
+   default n
+   ---help---
+   Say M here to enable the TCM_VHOST fabric module for use with 
virtio-scsi guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 72dd020..a27b053 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -1,2 +1,4 @@
 obj-$(CONFIG_VHOST_NET) += vhost_net.o
 vhost_net-y := vhost.o net.o
+
+obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
new file mode 100644
index 000..dc7e024
--- /dev/null
+++ b/drivers/vhost/tcm_vhost.c
@@ -0,0 +1,1611 @@
+/***
+ * Vhost kernel TCM fabric driver for virtio SCSI initiators
+ *
+ * (C) Copyright 2010-2012 RisingTide Systems LLC.
+ * (C) Copyright 2010-2012 IBM Corp.
+ *
+ * Licensed to the Linux Foundation under the General Public License (GPL) 
version 2.
+ *
+ * Authors: Nicholas A. Bellinger n...@risingtidesystems.com
+ *  Stefan Hajnoczi stefa...@linux.vnet.ibm.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ 

[PATCH v7] kvm: notify host when the guest is panicked

2012-07-21 Thread Wen Congyang
We can know the guest is panicked when the guest runs on xen.
But we do not have such feature on kvm.

Another purpose of this feature is: management app(for example:
libvirt) can do auto dump when the guest is panicked. If management
app does not do auto dump, the guest's user can do dump by hand if
he sees the guest is panicked.

We have three solutions to implement this feature:
1. use vmcall
2. use I/O port
3. use virtio-serial.

We have decided to avoid touching hypervisor. The reason why I choose
choose the I/O port is:
1. it is easier to implememt
2. it does not depend any virtual device
3. it can work when startint the kernel

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 arch/ia64/include/asm/kvm_para.h|5 +
 arch/powerpc/include/asm/kvm_para.h |5 +
 arch/s390/include/asm/kvm_para.h|5 +
 arch/x86/include/asm/kvm_para.h |7 +++
 arch/x86/kernel/kvm.c   |   14 ++
 include/linux/kvm_para.h|   13 +
 6 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_para.h b/arch/ia64/include/asm/kvm_para.h
index 2019cb9..187c0e2 100644
--- a/arch/ia64/include/asm/kvm_para.h
+++ b/arch/ia64/include/asm/kvm_para.h
@@ -31,6 +31,11 @@ static inline bool kvm_check_and_clear_guest_paused(void)
return false;
 }
 
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   return 0;
+}
+
 #endif
 
 #endif
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index c18916b..be81aac 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -211,6 +211,11 @@ static inline bool kvm_check_and_clear_guest_paused(void)
return false;
 }
 
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   return 0;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* __POWERPC_KVM_PARA_H__ */
diff --git a/arch/s390/include/asm/kvm_para.h b/arch/s390/include/asm/kvm_para.h
index a988329..3d993b7 100644
--- a/arch/s390/include/asm/kvm_para.h
+++ b/arch/s390/include/asm/kvm_para.h
@@ -154,6 +154,11 @@ static inline bool kvm_check_and_clear_guest_paused(void)
return false;
 }
 
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   return 0;
+}
+
 #endif
 
 #endif /* __S390_KVM_PARA_H */
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 63ab166..c8ad86e 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -89,6 +89,8 @@ struct kvm_vcpu_pv_apf_data {
__u32 enabled;
 };
 
+#define KVM_PV_PORT(0x505UL)
+
 #ifdef __KERNEL__
 #include asm/processor.h
 
@@ -221,6 +223,11 @@ static inline void kvm_disable_steal_time(void)
 }
 #endif
 
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   return inl(KVM_PV_PORT);
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e554e5a..9a97f7e 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -328,6 +328,17 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
+static int
+kvm_pv_panic_notify(struct notifier_block *nb, unsigned long code, void 
*unused)
+{
+   outl(KVM_PV_PANICKED, KVM_PV_PORT);
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block kvm_pv_panic_nb = {
+   .notifier_call = kvm_pv_panic_notify,
+};
+
 static u64 kvm_steal_clock(int cpu)
 {
u64 steal;
@@ -414,6 +425,9 @@ void __init kvm_guest_init(void)
 
paravirt_ops_setup();
register_reboot_notifier(kvm_pv_reboot_nb);
+   if (kvm_pv_has_feature(KVM_PV_FEATURE_PANICKED))
+   atomic_notifier_chain_register(panic_notifier_list,
+   kvm_pv_panic_nb);
for (i = 0; i  KVM_TASK_SLEEP_HASHSIZE; i++)
spin_lock_init(async_pf_sleepers[i].lock);
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF))
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index ff476dd..e73efcf 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -20,6 +20,12 @@
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
 
+/* The bit of the value read from KVM_PV_PORT */
+#define KVM_PV_FEATURE_PANICKED0
+
+/* The value writen to KVM_PV_PORT */
+#define KVM_PV_PANICKED1
+
 /*
  * hypercalls use architecture specific
  */
@@ -33,5 +39,12 @@ static inline int kvm_para_has_feature(unsigned int feature)
return 1;
return 0;
 }
+
+static inline int kvm_pv_has_feature(unsigned int feature)
+{
+   if (kvm_arch_pv_features()  (1UL  feature))
+   return 1;
+   return 0;
+}
 #endif /* __KERNEL__ */
 #endif /* __LINUX_KVM_PARA_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH 1/6 v7] start vm after reseting it

2012-07-21 Thread Wen Congyang
The guest should run after reseting it, but it does not run if its
old state is RUN_STATE_INTERNAL_ERROR or RUN_STATE_PAUSED.

We don't set runstate to RUN_STATE_PAUSED when reseting the guest,
so the runstate will be changed from RUN_STATE_INTERNAL_ERROR or
RUN_STATE_PAUSED to RUN_STATE_RUNNING(not RUN_STATE_PAUSED).

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 block.h |2 ++
 qmp.c   |2 +-
 vl.c|7 ---
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/block.h b/block.h
index c89590d..4ed042d 100644
--- a/block.h
+++ b/block.h
@@ -337,6 +337,8 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
+void iostatus_bdrv_it(void *opaque, BlockDriverState *bs);
+
 enum BlockAcctType {
 BDRV_ACCT_READ,
 BDRV_ACCT_WRITE,
diff --git a/qmp.c b/qmp.c
index fee9fb2..a111dff 100644
--- a/qmp.c
+++ b/qmp.c
@@ -125,7 +125,7 @@ SpiceInfo *qmp_query_spice(Error **errp)
 };
 #endif
 
-static void iostatus_bdrv_it(void *opaque, BlockDriverState *bs)
+void iostatus_bdrv_it(void *opaque, BlockDriverState *bs)
 {
 bdrv_iostatus_reset(bs);
 }
diff --git a/vl.c b/vl.c
index 8904db1..c7cee31 100644
--- a/vl.c
+++ b/vl.c
@@ -331,7 +331,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_INMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_INMIGRATE, RUN_STATE_PRELAUNCH },
 
-{ RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
+{ RUN_STATE_INTERNAL_ERROR, RUN_STATE_RUNNING },
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
 
 { RUN_STATE_IO_ERROR, RUN_STATE_RUNNING },
@@ -364,7 +364,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 
 { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
-{ RUN_STATE_SHUTDOWN, RUN_STATE_PAUSED },
+{ RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING },
 { RUN_STATE_SHUTDOWN, RUN_STATE_FINISH_MIGRATE },
 
 { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED },
@@ -1532,7 +1532,8 @@ static bool main_loop_should_exit(void)
 resume_all_vcpus();
 if (runstate_check(RUN_STATE_INTERNAL_ERROR) ||
 runstate_check(RUN_STATE_SHUTDOWN)) {
-runstate_set(RUN_STATE_PAUSED);
+bdrv_iterate(iostatus_bdrv_it, NULL);
+vm_start();
 }
 }
 if (qemu_powerdown_requested()) {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6 v7] kvm: Update kernel headers

2012-07-21 Thread Wen Congyang
Corresponding kvm.git hash: 37e41afa and apply my patch for kvm

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 linux-headers/asm-x86/kvm_para.h |2 ++
 linux-headers/linux/kvm_para.h   |6 ++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index f2ac46a..f9d858f 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -89,5 +89,7 @@ struct kvm_vcpu_pv_apf_data {
__u32 enabled;
 };
 
+#define KVM_PV_PORT(0x505UL)
+
 
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h
index 7bdcf93..4fbd625 100644
--- a/linux-headers/linux/kvm_para.h
+++ b/linux-headers/linux/kvm_para.h
@@ -20,6 +20,12 @@
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
 
+/* The bit of the value read from KVM_PV_PORT */
+#define KVM_PV_FEATURE_PANICKED0
+
+/* The value writen to KVM_PV_PORT */
+#define KVM_PV_PANICKED1
+
 /*
  * hypercalls use architecture specific
  */
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6 v7] add a new runstate: RUN_STATE_GUEST_PANICKED

2012-07-21 Thread Wen Congyang
The guest will be in this state when it is panicked.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 qapi-schema.json |6 +-
 qmp.c|3 ++-
 vl.c |7 ++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index a92adb1..5889af7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -119,11 +119,15 @@
 # @suspended: guest is suspended (ACPI S3)
 #
 # @watchdog: the watchdog action is configured to pause and has been triggered
+#
+# @guest-panicked: the panicked action is configured to pause and has been
+# triggered.
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
-'running', 'save-vm', 'shutdown', 'suspended', 'watchdog' ] }
+'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
+'guest-panicked' ] }
 
 ##
 # @StatusInfo:
diff --git a/qmp.c b/qmp.c
index a111dff..3b0c9bc 100644
--- a/qmp.c
+++ b/qmp.c
@@ -148,7 +148,8 @@ void qmp_cont(Error **errp)
 error_set(errp, QERR_MIGRATION_EXPECTED);
 return;
 } else if (runstate_check(RUN_STATE_INTERNAL_ERROR) ||
-   runstate_check(RUN_STATE_SHUTDOWN)) {
+   runstate_check(RUN_STATE_SHUTDOWN) ||
+   runstate_check(RUN_STATE_GUEST_PANICKED)) {
 error_set(errp, QERR_RESET_REQUIRED);
 return;
 } else if (runstate_check(RUN_STATE_SUSPENDED)) {
diff --git a/vl.c b/vl.c
index c7cee31..5e33c97 100644
--- a/vl.c
+++ b/vl.c
@@ -361,6 +361,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_RUNNING, RUN_STATE_SAVE_VM },
 { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
 { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
+{ RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
 
 { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
@@ -375,6 +376,9 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
 
+{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
+{ RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
+
 { RUN_STATE_MAX, RUN_STATE_MAX },
 };
 
@@ -1531,7 +1535,8 @@ static bool main_loop_should_exit(void)
 qemu_system_reset(VMRESET_REPORT);
 resume_all_vcpus();
 if (runstate_check(RUN_STATE_INTERNAL_ERROR) ||
-runstate_check(RUN_STATE_SHUTDOWN)) {
+runstate_check(RUN_STATE_SHUTDOWN) ||
+runstate_check(RUN_STATE_GUEST_PANICKED)) {
 bdrv_iterate(iostatus_bdrv_it, NULL);
 vm_start();
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6 v7] add a new qevent: QEVENT_GUEST_PANICKED

2012-07-21 Thread Wen Congyang
This event will be emited when the guest is panicked.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 monitor.c |1 +
 monitor.h |1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/monitor.c b/monitor.c
index 09aa3cd..a388e61 100644
--- a/monitor.c
+++ b/monitor.c
@@ -458,6 +458,7 @@ static const char *monitor_event_names[] = {
 [QEVENT_SUSPEND] = SUSPEND,
 [QEVENT_WAKEUP] = WAKEUP,
 [QEVENT_BALLOON_CHANGE] = BALLOON_CHANGE,
+[QEVENT_GUEST_PANICKED] = GUEST_PANICKED,
 };
 QEMU_BUILD_BUG_ON(ARRAY_SIZE(monitor_event_names) != QEVENT_MAX)
 
diff --git a/monitor.h b/monitor.h
index 5f4de1b..cb7de8c 100644
--- a/monitor.h
+++ b/monitor.h
@@ -42,6 +42,7 @@ typedef enum MonitorEvent {
 QEVENT_SUSPEND,
 QEVENT_WAKEUP,
 QEVENT_BALLOON_CHANGE,
+QEVENT_GUEST_PANICKED,
 
 /* Add to 'monitor_event_names' array in monitor.c when
  * defining new events here */
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6 v7] introduce a new qom device to deal with panicked event

2012-07-21 Thread Wen Congyang
If the target is x86/x86_64, the guest's kernel will write 0x01 to the
port KVM_PV_PORT when it is panciked. This patch introduces a new qom
device kvm_pv_ioport to listen this I/O port, and deal with panicked
event according to panicked_action's value. The possible actions are:
1. emit QEVENT_GUEST_PANICKED only
2. emit QEVENT_GUEST_PANICKED and pause the guest
3. emit QEVENT_GUEST_PANICKED and poweroff the guest
4. emit QEVENT_GUEST_PANICKED and reset the guest

I/O ports does not work for some targets(for example: s390). And you
can implement another qom device, and include it's code into pv_event.c
for such target.

Note: if we emit QEVENT_GUEST_PANICKED only, and the management
application does not receive this event(the management may not
run when the event is emitted), the management won't know the
guest is panicked.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 hw/kvm/Makefile.objs |2 +-
 hw/kvm/pv_event.c|  109 ++
 hw/kvm/pv_ioport.c   |   93 ++
 hw/pc_piix.c |9 
 kvm.h|2 +
 5 files changed, 214 insertions(+), 1 deletions(-)
 create mode 100644 hw/kvm/pv_event.c
 create mode 100644 hw/kvm/pv_ioport.c

diff --git a/hw/kvm/Makefile.objs b/hw/kvm/Makefile.objs
index 226497a..23e3b30 100644
--- a/hw/kvm/Makefile.objs
+++ b/hw/kvm/Makefile.objs
@@ -1 +1 @@
-obj-$(CONFIG_KVM) += clock.o apic.o i8259.o ioapic.o i8254.o
+obj-$(CONFIG_KVM) += clock.o apic.o i8259.o ioapic.o i8254.o pv_event.o
diff --git a/hw/kvm/pv_event.c b/hw/kvm/pv_event.c
new file mode 100644
index 000..d292926
--- /dev/null
+++ b/hw/kvm/pv_event.c
@@ -0,0 +1,109 @@
+/*
+ * QEMU KVM support, paravirtual event device
+ *
+ * Copyright Fujitsu, Corp. 2012
+ *
+ * Authors:
+ * Wen Congyang we...@cn.fujitsu.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include linux/kvm_para.h
+#include asm/kvm_para.h
+#include qobject.h
+#include qjson.h
+#include monitor.h
+#include sysemu.h
+#include kvm.h
+
+/* Possible values for action parameter. */
+#define PANICKED_REPORT 1   /* emit QEVENT_GUEST_PANICKED only */
+#define PANICKED_PAUSE  2   /* emit QEVENT_GUEST_PANICKED and pause VM */
+#define PANICKED_POWEROFF   3   /* emit QEVENT_GUEST_PANICKED and quit VM */
+#define PANICKED_RESET  4   /* emit QEVENT_GUEST_PANICKED and reset VM */
+
+#define PV_EVENT_DRIVER kvm_pv_event
+
+struct pv_event_action {
+char *panicked_action;
+int panicked_action_value;
+};
+
+#define DEFINE_PV_EVENT_PROPERTIES(_state, _conf)   \
+DEFINE_PROP_STRING(panicked_action, _state, _conf.panicked_action)
+
+static void panicked_mon_event(const char *action)
+{
+QObject *data;
+
+data = qobject_from_jsonf({ 'action': %s }, action);
+monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
+qobject_decref(data);
+}
+
+static void panicked_perform_action(uint32_t panicked_action)
+{
+switch (panicked_action) {
+case PANICKED_REPORT:
+panicked_mon_event(report);
+break;
+
+case PANICKED_PAUSE:
+panicked_mon_event(pause);
+vm_stop(RUN_STATE_GUEST_PANICKED);
+break;
+
+case PANICKED_POWEROFF:
+panicked_mon_event(poweroff);
+qemu_system_shutdown_request();
+break;
+case PANICKED_RESET:
+panicked_mon_event(reset);
+qemu_system_reset_request();
+break;
+}
+}
+
+static uint64_t supported_event(void)
+{
+return 1  KVM_PV_FEATURE_PANICKED;
+}
+
+static void handle_event(int event, struct pv_event_action *conf)
+{
+if (event == KVM_PV_PANICKED) {
+panicked_perform_action(conf-panicked_action_value);
+}
+}
+
+static int pv_event_init(struct pv_event_action *conf)
+{
+if (!conf-panicked_action) {
+conf-panicked_action_value = PANICKED_REPORT;
+} else if (strcasecmp(conf-panicked_action, none) == 0) {
+conf-panicked_action_value = PANICKED_REPORT;
+} else if (strcasecmp(conf-panicked_action, pause) == 0) {
+conf-panicked_action_value = PANICKED_PAUSE;
+} else if (strcasecmp(conf-panicked_action, poweroff) == 0) {
+conf-panicked_action_value = PANICKED_POWEROFF;
+} else if (strcasecmp(conf-panicked_action, reset) == 0) {
+conf-panicked_action_value = PANICKED_RESET;
+} else {
+return -1;
+}
+
+return 0;
+}
+
+#if defined(KVM_PV_PORT)
+
+#include pv_ioport.c
+
+#else
+void kvm_pv_event_init(void *opaque)
+{
+}
+#endif
diff --git a/hw/kvm/pv_ioport.c b/hw/kvm/pv_ioport.c
new file mode 100644
index 000..2dbb75b
--- /dev/null
+++ b/hw/kvm/pv_ioport.c
@@ -0,0 +1,93 @@
+/*
+ * QEMU KVM support, paravirtual I/O port device
+ *
+ * Copyright Fujitsu, Corp. 2012
+ *
+ * Authors:
+ * Wen Congyang we...@cn.fujitsu.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or 

[PATCH 6/6 v7] allow the user to disable pv event support

2012-07-21 Thread Wen Congyang
The qom device uses a fixed PIO port that might conflict
with (non-Linux) guest expectations and/or future device models.
So allow the user to disable it.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 hw/pc_piix.c|6 +-
 qemu-config.c   |4 
 qemu-options.hx |3 ++-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 7ec2853..48fae72 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -151,6 +151,8 @@ static void pc_init1(MemoryRegion *system_memory,
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 void *fw_cfg = NULL;
+QemuOptsList *list = qemu_find_opts(machine);
+bool enable_pv_event;
 
 pc_cpus_init(cpu_model);
 
@@ -289,8 +291,10 @@ static void pc_init1(MemoryRegion *system_memory,
 pc_pci_device_init(pci_bus);
 }
 
+enable_pv_event = qemu_opt_get_bool(QTAILQ_FIRST(list-head),
+enable_pv_event, false);
 #ifdef KVM_PV_PORT
-if (kvm_enabled()) {
+if (kvm_enabled()  enable_pv_event) {
 kvm_pv_event_init(isa_bus);
 }
 #endif
diff --git a/qemu-config.c b/qemu-config.c
index 5c3296b..5ec5aa9 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -595,6 +595,10 @@ static QemuOptsList qemu_machine_opts = {
 .name = dt_compatible,
 .type = QEMU_OPT_STRING,
 .help = Overrides the \compatible\ property of the dt root 
node,
+}, {
+.name = enable_pv_event,
+.type = QEMU_OPT_BOOL,
+.help = handle pv event
 },
 { /* End of list */ }
 },
diff --git a/qemu-options.hx b/qemu-options.hx
index 97245a3..5661918 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -33,7 +33,8 @@ DEF(machine, HAS_ARG, QEMU_OPTION_machine, \
 property accel=accel1[:accel2[:...]] selects 
accelerator\n
 supported accelerators are kvm, xen, tcg (default: tcg)\n
 kernel_irqchip=on|off controls accelerated irqchip 
support\n
-kvm_shadow_mem=size of KVM shadow MMU\n,
+kvm_shadow_mem=size of KVM shadow MMU\n
+enable_pv_event=on|off controls pv event support\n,
 QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7] kvm: notify host when the guest is panicked

2012-07-21 Thread Jan Kiszka
On 2012-07-21 09:12, Wen Congyang wrote:
 We can know the guest is panicked when the guest runs on xen.
 But we do not have such feature on kvm.
 
 Another purpose of this feature is: management app(for example:
 libvirt) can do auto dump when the guest is panicked. If management
 app does not do auto dump, the guest's user can do dump by hand if
 he sees the guest is panicked.
 
 We have three solutions to implement this feature:
 1. use vmcall
 2. use I/O port
 3. use virtio-serial.
 
 We have decided to avoid touching hypervisor. The reason why I choose
 choose the I/O port is:
 1. it is easier to implememt
 2. it does not depend any virtual device
 3. it can work when startint the kernel
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
  arch/ia64/include/asm/kvm_para.h|5 +
  arch/powerpc/include/asm/kvm_para.h |5 +
  arch/s390/include/asm/kvm_para.h|5 +
  arch/x86/include/asm/kvm_para.h |7 +++
  arch/x86/kernel/kvm.c   |   14 ++
  include/linux/kvm_para.h|   13 +
  6 files changed, 49 insertions(+), 0 deletions(-)
 
 diff --git a/arch/ia64/include/asm/kvm_para.h 
 b/arch/ia64/include/asm/kvm_para.h
 index 2019cb9..187c0e2 100644
 --- a/arch/ia64/include/asm/kvm_para.h
 +++ b/arch/ia64/include/asm/kvm_para.h
 @@ -31,6 +31,11 @@ static inline bool kvm_check_and_clear_guest_paused(void)
   return false;
  }
  
 +static inline unsigned int kvm_arch_pv_features(void)
 +{
 + return 0;
 +}
 +
  #endif
  
  #endif
 diff --git a/arch/powerpc/include/asm/kvm_para.h 
 b/arch/powerpc/include/asm/kvm_para.h
 index c18916b..be81aac 100644
 --- a/arch/powerpc/include/asm/kvm_para.h
 +++ b/arch/powerpc/include/asm/kvm_para.h
 @@ -211,6 +211,11 @@ static inline bool kvm_check_and_clear_guest_paused(void)
   return false;
  }
  
 +static inline unsigned int kvm_arch_pv_features(void)
 +{
 + return 0;
 +}
 +
  #endif /* __KERNEL__ */
  
  #endif /* __POWERPC_KVM_PARA_H__ */
 diff --git a/arch/s390/include/asm/kvm_para.h 
 b/arch/s390/include/asm/kvm_para.h
 index a988329..3d993b7 100644
 --- a/arch/s390/include/asm/kvm_para.h
 +++ b/arch/s390/include/asm/kvm_para.h
 @@ -154,6 +154,11 @@ static inline bool kvm_check_and_clear_guest_paused(void)
   return false;
  }
  
 +static inline unsigned int kvm_arch_pv_features(void)
 +{
 + return 0;
 +}
 +
  #endif
  
  #endif /* __S390_KVM_PARA_H */
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index 63ab166..c8ad86e 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -89,6 +89,8 @@ struct kvm_vcpu_pv_apf_data {
   __u32 enabled;
  };
  
 +#define KVM_PV_PORT  (0x505UL)
 +
  #ifdef __KERNEL__
  #include asm/processor.h
  
 @@ -221,6 +223,11 @@ static inline void kvm_disable_steal_time(void)
  }
  #endif
  
 +static inline unsigned int kvm_arch_pv_features(void)
 +{
 + return inl(KVM_PV_PORT);
 +}
 +
  #endif /* __KERNEL__ */
  
  #endif /* _ASM_X86_KVM_PARA_H */
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index e554e5a..9a97f7e 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -328,6 +328,17 @@ static struct notifier_block kvm_pv_reboot_nb = {
   .notifier_call = kvm_pv_reboot_notify,
  };
  
 +static int
 +kvm_pv_panic_notify(struct notifier_block *nb, unsigned long code, void 
 *unused)
 +{
 + outl(KVM_PV_PANICKED, KVM_PV_PORT);
 + return NOTIFY_DONE;
 +}
 +
 +static struct notifier_block kvm_pv_panic_nb = {
 + .notifier_call = kvm_pv_panic_notify,
 +};
 +
  static u64 kvm_steal_clock(int cpu)
  {
   u64 steal;
 @@ -414,6 +425,9 @@ void __init kvm_guest_init(void)
  
   paravirt_ops_setup();
   register_reboot_notifier(kvm_pv_reboot_nb);
 + if (kvm_pv_has_feature(KVM_PV_FEATURE_PANICKED))
 + atomic_notifier_chain_register(panic_notifier_list,
 + kvm_pv_panic_nb);
   for (i = 0; i  KVM_TASK_SLEEP_HASHSIZE; i++)
   spin_lock_init(async_pf_sleepers[i].lock);
   if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF))
 diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
 index ff476dd..e73efcf 100644
 --- a/include/linux/kvm_para.h
 +++ b/include/linux/kvm_para.h
 @@ -20,6 +20,12 @@
  #define KVM_HC_FEATURES  3
  #define KVM_HC_PPC_MAP_MAGIC_PAGE4
  
 +/* The bit of the value read from KVM_PV_PORT */
 +#define KVM_PV_FEATURE_PANICKED  0
 +
 +/* The value writen to KVM_PV_PORT */
 +#define KVM_PV_PANICKED  1
 +
  /*
   * hypercalls use architecture specific
   */
 @@ -33,5 +39,12 @@ static inline int kvm_para_has_feature(unsigned int 
 feature)
   return 1;
   return 0;
  }
 +
 +static inline int kvm_pv_has_feature(unsigned int feature)
 +{
 + if (kvm_arch_pv_features()  (1UL  feature))

Reading from an invalid I/O port will return -1. So your test will
deliver a wrong result on a platform that 

Re: [PATCH v7] kvm: notify host when the guest is panicked

2012-07-21 Thread Wen Congyang
At 07/21/2012 03:19 PM, Jan Kiszka Wrote:
 On 2012-07-21 09:12, Wen Congyang wrote:
 We can know the guest is panicked when the guest runs on xen.
 But we do not have such feature on kvm.

 Another purpose of this feature is: management app(for example:
 libvirt) can do auto dump when the guest is panicked. If management
 app does not do auto dump, the guest's user can do dump by hand if
 he sees the guest is panicked.

 We have three solutions to implement this feature:
 1. use vmcall
 2. use I/O port
 3. use virtio-serial.

 We have decided to avoid touching hypervisor. The reason why I choose
 choose the I/O port is:
 1. it is easier to implememt
 2. it does not depend any virtual device
 3. it can work when startint the kernel

 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
  arch/ia64/include/asm/kvm_para.h|5 +
  arch/powerpc/include/asm/kvm_para.h |5 +
  arch/s390/include/asm/kvm_para.h|5 +
  arch/x86/include/asm/kvm_para.h |7 +++
  arch/x86/kernel/kvm.c   |   14 ++
  include/linux/kvm_para.h|   13 +
  6 files changed, 49 insertions(+), 0 deletions(-)

 diff --git a/arch/ia64/include/asm/kvm_para.h 
 b/arch/ia64/include/asm/kvm_para.h
 index 2019cb9..187c0e2 100644
 --- a/arch/ia64/include/asm/kvm_para.h
 +++ b/arch/ia64/include/asm/kvm_para.h
 @@ -31,6 +31,11 @@ static inline bool kvm_check_and_clear_guest_paused(void)
  return false;
  }
  
 +static inline unsigned int kvm_arch_pv_features(void)
 +{
 +return 0;
 +}
 +
  #endif
  
  #endif
 diff --git a/arch/powerpc/include/asm/kvm_para.h 
 b/arch/powerpc/include/asm/kvm_para.h
 index c18916b..be81aac 100644
 --- a/arch/powerpc/include/asm/kvm_para.h
 +++ b/arch/powerpc/include/asm/kvm_para.h
 @@ -211,6 +211,11 @@ static inline bool 
 kvm_check_and_clear_guest_paused(void)
  return false;
  }
  
 +static inline unsigned int kvm_arch_pv_features(void)
 +{
 +return 0;
 +}
 +
  #endif /* __KERNEL__ */
  
  #endif /* __POWERPC_KVM_PARA_H__ */
 diff --git a/arch/s390/include/asm/kvm_para.h 
 b/arch/s390/include/asm/kvm_para.h
 index a988329..3d993b7 100644
 --- a/arch/s390/include/asm/kvm_para.h
 +++ b/arch/s390/include/asm/kvm_para.h
 @@ -154,6 +154,11 @@ static inline bool 
 kvm_check_and_clear_guest_paused(void)
  return false;
  }
  
 +static inline unsigned int kvm_arch_pv_features(void)
 +{
 +return 0;
 +}
 +
  #endif
  
  #endif /* __S390_KVM_PARA_H */
 diff --git a/arch/x86/include/asm/kvm_para.h 
 b/arch/x86/include/asm/kvm_para.h
 index 63ab166..c8ad86e 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -89,6 +89,8 @@ struct kvm_vcpu_pv_apf_data {
  __u32 enabled;
  };
  
 +#define KVM_PV_PORT (0x505UL)
 +
  #ifdef __KERNEL__
  #include asm/processor.h
  
 @@ -221,6 +223,11 @@ static inline void kvm_disable_steal_time(void)
  }
  #endif
  
 +static inline unsigned int kvm_arch_pv_features(void)
 +{
 +return inl(KVM_PV_PORT);
 +}
 +
  #endif /* __KERNEL__ */
  
  #endif /* _ASM_X86_KVM_PARA_H */
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index e554e5a..9a97f7e 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -328,6 +328,17 @@ static struct notifier_block kvm_pv_reboot_nb = {
  .notifier_call = kvm_pv_reboot_notify,
  };
  
 +static int
 +kvm_pv_panic_notify(struct notifier_block *nb, unsigned long code, void 
 *unused)
 +{
 +outl(KVM_PV_PANICKED, KVM_PV_PORT);
 +return NOTIFY_DONE;
 +}
 +
 +static struct notifier_block kvm_pv_panic_nb = {
 +.notifier_call = kvm_pv_panic_notify,
 +};
 +
  static u64 kvm_steal_clock(int cpu)
  {
  u64 steal;
 @@ -414,6 +425,9 @@ void __init kvm_guest_init(void)
  
  paravirt_ops_setup();
  register_reboot_notifier(kvm_pv_reboot_nb);
 +if (kvm_pv_has_feature(KVM_PV_FEATURE_PANICKED))
 +atomic_notifier_chain_register(panic_notifier_list,
 +kvm_pv_panic_nb);
  for (i = 0; i  KVM_TASK_SLEEP_HASHSIZE; i++)
  spin_lock_init(async_pf_sleepers[i].lock);
  if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF))
 diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
 index ff476dd..e73efcf 100644
 --- a/include/linux/kvm_para.h
 +++ b/include/linux/kvm_para.h
 @@ -20,6 +20,12 @@
  #define KVM_HC_FEATURES 3
  #define KVM_HC_PPC_MAP_MAGIC_PAGE   4
  
 +/* The bit of the value read from KVM_PV_PORT */
 +#define KVM_PV_FEATURE_PANICKED 0
 +
 +/* The value writen to KVM_PV_PORT */
 +#define KVM_PV_PANICKED 1
 +
  /*
   * hypercalls use architecture specific
   */
 @@ -33,5 +39,12 @@ static inline int kvm_para_has_feature(unsigned int 
 feature)
  return 1;
  return 0;
  }
 +
 +static inline int kvm_pv_has_feature(unsigned int feature)
 +{
 +if (kvm_arch_pv_features()  (1UL  feature))
 
 Reading from an invalid I/O port will return -1. So your test will
 deliver a wrong result on a 

RE: [PATCH 2/2 v5] KVM: PPC: booke: Add watchdog emulation

2012-07-21 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, July 21, 2012 2:59 AM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Bhushan 
 Bharat-
 R65777
 Subject: Re: [PATCH 2/2 v5] KVM: PPC: booke: Add watchdog emulation
 
 On 07/20/2012 12:00 AM, Bharat Bhushan wrote:
  This patch adds the watchdog emulation in KVM. The watchdog emulation
  is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_WDT) ioctl.
  The kernel timer are used for watchdog emulation and emulates h/w
  watchdog state machine. On watchdog timer expiry, it exit to QEMU if
  TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how it
  is configured.
 
  Signed-off-by: Liu Yu yu@freescale.com
  Signed-off-by: Scott Wood scottw...@freescale.com
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  [bharat.bhus...@freescale.com: reworked patch]
 
 Typically the [] note goes immediately before your signoff (but after the
 others).

ok

 
  +static void arm_next_watchdog(struct kvm_vcpu *vcpu) {
  +   unsigned long nr_jiffies;
  +
  +   spin_lock(vcpu-arch.wdt_lock);
  +   nr_jiffies = watchdog_next_timeout(vcpu);
  +   /*
  +* If the number of jiffies of watchdog timer = NEXT_TIMER_MAX_DELTA
  +* then do not run the watchdog timer as this can break timer APIs.
  +*/
  +   if (nr_jiffies  NEXT_TIMER_MAX_DELTA)
  +   mod_timer(vcpu-arch.wdt_timer, jiffies + nr_jiffies);
  +   else
  +   del_timer(vcpu-arch.wdt_timer);
  +   spin_unlock(vcpu-arch.wdt_lock);
  +}
 
 This needs to be an irqsave lock.

Ok, I want to understood why irqsave lock should be used here?
irqsave_lock is used when the critical section within lock can be referenced 
from interrupt context also. What part of critical section above can get 
affected from local irq? Is it jiffies here?


 
  @@ -386,13 +387,23 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
  #ifdef CONFIG_KVM_EXIT_TIMING
  mutex_init(vcpu-arch.exit_timing_lock);
   #endif
  -
  +#ifdef CONFIG_BOOKE
  +   spin_lock_init(vcpu-arch.wdt_lock);
  +   /* setup watchdog timer once */
  +   setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func,
  +   (unsigned long)vcpu);
  +#endif
  return 0;
   }
 
 Can you do this in kvmppc_booke_init()?

Ok, so the *_uninit part of code also needed to be moved in respective function.

 
 
   void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)  {
  kvmppc_mmu_destroy(vcpu);
  +#ifdef CONFIG_BOOKE
  +   spin_lock(vcpu-arch.wdt_lock);
  +   del_timer(vcpu-arch.wdt_timer);
  +   spin_unlock(vcpu-arch.wdt_lock);
  +#endif
   }
 
 Don't acquire the lock here, but use del_timer_sync().

Ok.

Thanks
-Bharat

 
 -Scott



[PATCH v7.5] kvm: notify host when the guest is panicked

2012-07-21 Thread Wen Congyang
We can know the guest is panicked when the guest runs on xen.
But we do not have such feature on kvm.

Another purpose of this feature is: management app(for example:
libvirt) can do auto dump when the guest is panicked. If management
app does not do auto dump, the guest's user can do dump by hand if
he sees the guest is panicked.

We have three solutions to implement this feature:
1. use vmcall
2. use I/O port
3. use virtio-serial.

We have decided to avoid touching hypervisor. The reason why I choose
choose the I/O port is:
1. it is easier to implememt
2. it does not depend any virtual device
3. it can work when starting the kernel

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 arch/ia64/include/asm/kvm_para.h|5 +
 arch/powerpc/include/asm/kvm_para.h |5 +
 arch/s390/include/asm/kvm_para.h|5 +
 arch/x86/include/asm/kvm_para.h |   13 +
 arch/x86/kernel/kvm.c   |   14 ++
 include/linux/kvm_para.h|   13 +
 6 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_para.h b/arch/ia64/include/asm/kvm_para.h
index 2019cb9..187c0e2 100644
--- a/arch/ia64/include/asm/kvm_para.h
+++ b/arch/ia64/include/asm/kvm_para.h
@@ -31,6 +31,11 @@ static inline bool kvm_check_and_clear_guest_paused(void)
return false;
 }
 
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   return 0;
+}
+
 #endif
 
 #endif
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index c18916b..be81aac 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -211,6 +211,11 @@ static inline bool kvm_check_and_clear_guest_paused(void)
return false;
 }
 
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   return 0;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* __POWERPC_KVM_PARA_H__ */
diff --git a/arch/s390/include/asm/kvm_para.h b/arch/s390/include/asm/kvm_para.h
index a988329..3d993b7 100644
--- a/arch/s390/include/asm/kvm_para.h
+++ b/arch/s390/include/asm/kvm_para.h
@@ -154,6 +154,11 @@ static inline bool kvm_check_and_clear_guest_paused(void)
return false;
 }
 
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   return 0;
+}
+
 #endif
 
 #endif /* __S390_KVM_PARA_H */
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 63ab166..647561b 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -89,6 +89,8 @@ struct kvm_vcpu_pv_apf_data {
__u32 enabled;
 };
 
+#define KVM_PV_PORT(0x505UL)
+
 #ifdef __KERNEL__
 #include asm/processor.h
 
@@ -221,6 +223,17 @@ static inline void kvm_disable_steal_time(void)
 }
 #endif
 
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   unsigned int features = inl(KVM_PV_PORT);
+
+   /* Reading from an invalid I/O port will return -1 */
+   if (features == ~0)
+   features = 0;
+
+   return features;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e554e5a..9a97f7e 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -328,6 +328,17 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
+static int
+kvm_pv_panic_notify(struct notifier_block *nb, unsigned long code, void 
*unused)
+{
+   outl(KVM_PV_PANICKED, KVM_PV_PORT);
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block kvm_pv_panic_nb = {
+   .notifier_call = kvm_pv_panic_notify,
+};
+
 static u64 kvm_steal_clock(int cpu)
 {
u64 steal;
@@ -414,6 +425,9 @@ void __init kvm_guest_init(void)
 
paravirt_ops_setup();
register_reboot_notifier(kvm_pv_reboot_nb);
+   if (kvm_pv_has_feature(KVM_PV_FEATURE_PANICKED))
+   atomic_notifier_chain_register(panic_notifier_list,
+   kvm_pv_panic_nb);
for (i = 0; i  KVM_TASK_SLEEP_HASHSIZE; i++)
spin_lock_init(async_pf_sleepers[i].lock);
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF))
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index ff476dd..e73efcf 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -20,6 +20,12 @@
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
 
+/* The bit of the value read from KVM_PV_PORT */
+#define KVM_PV_FEATURE_PANICKED0
+
+/* The value writen to KVM_PV_PORT */
+#define KVM_PV_PANICKED1
+
 /*
  * hypercalls use architecture specific
  */
@@ -33,5 +39,12 @@ static inline int kvm_para_has_feature(unsigned int feature)
return 1;
return 0;
 }
+
+static inline int kvm_pv_has_feature(unsigned int feature)
+{
+   if (kvm_arch_pv_features()  (1UL  feature))
+   return 1;
+   return 0;
+}
 #endif /* __KERNEL__ */
 #endif /* 

Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Peter Maydell
On 21 July 2012 07:57, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-20 21:14, Peter Maydell wrote:
 I'm sure this isn't the only x86ism in the KVM generic source
 files. However the thing I'm specifically trying to do is
 nuke all the uses of kvm_irqchip_in_kernel() in common code,

 No, irqchip in kernel is supposed to be a generic concept. We will
 also have it on Power. Not sure what your plans are for ARM, maybe it
 will always be true there.

I agree that irqchip in kernel? is generic (though as you'll see
below there's disagreement about what that ought to mean or imply).
irq0_override though seems to me to be absolutely x86 specific.

 That said, maybe there is room for discussion about what it means for
 the general KVM code and its users if the irqchip is in the kernel. Two
 things that should be common for every arch:
  - VCPU idle management is done inside the kernel

The trouble is that at the moment QEMU assumes that is the
irqchip in kernel? == is VCPU idle management in kernel, for
instance. For ARM, VCPU idle management is in kernel whether
we're using the kernel's model of the VGIC or not. Alex tells
me PPC is the same way. It's only x86 that has tied these two
concepts together.

The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel()
is because I think they're all similar to this -- the common code is
using the check as a proxy for something else, and it should be directly
asking about that something else. The only bits of code that should
care about is the irqchip in kernel? are:
 * target-specific device/machine setup code which needs to know
   which apic/etc to instantiate
 * target-specific x86 code which has this weird synchronous IRQ
   delivery model for irqchip-not-in-kernel
(Obviously I might have missed something, I'm flailing around
trying to understand this code :-))

  - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly

 The latter point implies that irqfd is available and that interrupt
 routes from virtual IRQs (*) (like the one associated with an irqfd) to
 the in-kernel IRQ controller have to be established. That's pretty generic.

But you can perfectly well have an in-kernel-irqchip that doesn't
support irqfd -- it just means that interrupts from devices have
to come in via the ioctls same as anything else. Some in-kernel
helpers obviously would depend on the existence and use of a full
featured in-kernel irqchip (on ARM you don't get the in kernel timer
unless you have in kernel VGIC), but I don't see why the virtio code
should be asking is there an in kernel irqchip? rather than can
I do irqfd routing? or whatever the question is it actually wants
to ask.  (In fact the virtio code probably needs to do something
more complex anyway: you could perfectly well have a system where
there is a full-featured irqchip in the kernel but the virtio
device is on the wrong side of a second interrupt controller
which is not in-kernel. So the actual question it needs to ask
is does the interrupt wiring in this specific machine model mean
I can get and use an irqfd from where I am to the main CPU
interrupt controller? or something similar.)

 as part of trying to untangle what irqchip model do you want
 from other aspects of the interrupt handling model.

 Other than this one, there are uses in cpus.c and hw/virtio-pci.c,
 but I haven't figured out yet exactly what those bits of code
 are trying to do...

 See above for the two reasons.

Thanks for the explanations.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Jan Kiszka
On 2012-07-21 10:54, Peter Maydell wrote:
 On 21 July 2012 07:57, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-20 21:14, Peter Maydell wrote:
 I'm sure this isn't the only x86ism in the KVM generic source
 files. However the thing I'm specifically trying to do is
 nuke all the uses of kvm_irqchip_in_kernel() in common code,

 No, irqchip in kernel is supposed to be a generic concept. We will
 also have it on Power. Not sure what your plans are for ARM, maybe it
 will always be true there.
 
 I agree that irqchip in kernel? is generic (though as you'll see
 below there's disagreement about what that ought to mean or imply).
 irq0_override though seems to me to be absolutely x86 specific.

Naming is x86 specific, semantic not. It means that KVM doesn't prevent
remapping of IRQs. Granted, I really hope you don't make such mistakes
in your arch.

 
 That said, maybe there is room for discussion about what it means for
 the general KVM code and its users if the irqchip is in the kernel. Two
 things that should be common for every arch:
  - VCPU idle management is done inside the kernel
 
 The trouble is that at the moment QEMU assumes that is the
 irqchip in kernel? == is VCPU idle management in kernel, for
 instance. For ARM, VCPU idle management is in kernel whether
 we're using the kernel's model of the VGIC or not. Alex tells
 me PPC is the same way. It's only x86 that has tied these two
 concepts together.

Hmm, and why does Power work despite this mismatch?

If cpu_thread_is_idle doesn't work for you, define something like
kvm_idle_in_kernel() to replace kvm_irqchip_in_kernel here.

 
 The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel()
 is because I think they're all similar to this -- the common code is
 using the check as a proxy for something else, and it should be directly
 asking about that something else. The only bits of code that should
 care about is the irqchip in kernel? are:
  * target-specific device/machine setup code which needs to know
which apic/etc to instantiate
  * target-specific x86 code which has this weird synchronous IRQ
delivery model for irqchip-not-in-kernel
 (Obviously I might have missed something, I'm flailing around
 trying to understand this code :-))
 
  - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly

 The latter point implies that irqfd is available and that interrupt
 routes from virtual IRQs (*) (like the one associated with an irqfd) to
 the in-kernel IRQ controller have to be established. That's pretty generic.
 
 But you can perfectly well have an in-kernel-irqchip that doesn't
 support irqfd 

You could, thought this doesn't make much sense.

 -- it just means that interrupts from devices have
 to come in via the ioctls same as anything else. Some in-kernel
 helpers obviously would depend on the existence and use of a full
 featured in-kernel irqchip (on ARM you don't get the in kernel timer
 unless you have in kernel VGIC), but I don't see why the virtio code
 should be asking is there an in kernel irqchip? rather than can
 I do irqfd routing? or whatever the question is it actually wants
 to ask.  (In fact the virtio code probably needs to do something
 more complex anyway: you could perfectly well have a system where
 there is a full-featured irqchip in the kernel but the virtio
 device is on the wrong side of a second interrupt controller
 which is not in-kernel. So the actual question it needs to ask
 is does the interrupt wiring in this specific machine model mean
 I can get and use an irqfd from where I am to the main CPU
 interrupt controller? or something similar.)

Well, same here: then define more precise generic test functions.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Peter Maydell
On 21 July 2012 10:14, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 10:54, Peter Maydell wrote:
 On 21 July 2012 07:57, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-20 21:14, Peter Maydell wrote:
 I'm sure this isn't the only x86ism in the KVM generic source
 files. However the thing I'm specifically trying to do is
 nuke all the uses of kvm_irqchip_in_kernel() in common code,

 No, irqchip in kernel is supposed to be a generic concept. We will
 also have it on Power. Not sure what your plans are for ARM, maybe it
 will always be true there.

 I agree that irqchip in kernel? is generic (though as you'll see
 below there's disagreement about what that ought to mean or imply).
 irq0_override though seems to me to be absolutely x86 specific.

 Naming is x86 specific, semantic not. It means that KVM doesn't prevent
 remapping of IRQs. Granted, I really hope you don't make such mistakes
 in your arch.

What does remapping of IRQs mean here? This is still sounding
not very generic to me, in that I really don't know what it would
mean in an ARM context. The fact that the only caller of this is
in hw/pc.c is also a big red flag that this isn't exactly generic.

 That said, maybe there is room for discussion about what it means for
 the general KVM code and its users if the irqchip is in the kernel. Two
 things that should be common for every arch:
  - VCPU idle management is done inside the kernel

 The trouble is that at the moment QEMU assumes that is the
 irqchip in kernel? == is VCPU idle management in kernel, for
 instance. For ARM, VCPU idle management is in kernel whether
 we're using the kernel's model of the VGIC or not. Alex tells
 me PPC is the same way. It's only x86 that has tied these two
 concepts together.

 Hmm, and why does Power work despite this mismatch?

I think because hw/ppc.c:ppc_set_irq() both calls cpu_interrupt()
and also kvmppc_set_interrupt(), so we end up with a sort of odd
mix of both models... Alex?

 If cpu_thread_is_idle doesn't work for you, define something like
 kvm_idle_in_kernel() to replace kvm_irqchip_in_kernel here.

Yes, this is kind of where I'm headed. I thought I'd start with this
patch as the easiest one first, though...

 The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel()
 is because I think they're all similar to this -- the common code is
 using the check as a proxy for something else, and it should be directly
 asking about that something else. The only bits of code that should
 care about is the irqchip in kernel? are:
  * target-specific device/machine setup code which needs to know
which apic/etc to instantiate
  * target-specific x86 code which has this weird synchronous IRQ
delivery model for irqchip-not-in-kernel
 (Obviously I might have missed something, I'm flailing around
 trying to understand this code :-))

  - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly

 The latter point implies that irqfd is available and that interrupt
 routes from virtual IRQs (*) (like the one associated with an irqfd) to
 the in-kernel IRQ controller have to be established. That's pretty generic.

 But you can perfectly well have an in-kernel-irqchip that doesn't
 support irqfd

 You could, thought this doesn't make much sense.

Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take
advantage of the hardware support for a virtual GIC, and you can use
the virtual timer support too. These are both big performance advantages
even if QEMU never does anything with irqfds. (In fact the current
ARM KVM VGIC code doesn't support irqfds as far as I can see from
a quick scan of the kernel code.)

 -- it just means that interrupts from devices have
 to come in via the ioctls same as anything else. Some in-kernel
 helpers obviously would depend on the existence and use of a full
 featured in-kernel irqchip (on ARM you don't get the in kernel timer
 unless you have in kernel VGIC), but I don't see why the virtio code
 should be asking is there an in kernel irqchip? rather than can
 I do irqfd routing? or whatever the question is it actually wants
 to ask.  (In fact the virtio code probably needs to do something
 more complex anyway: you could perfectly well have a system where
 there is a full-featured irqchip in the kernel but the virtio
 device is on the wrong side of a second interrupt controller
 which is not in-kernel. So the actual question it needs to ask
 is does the interrupt wiring in this specific machine model mean
 I can get and use an irqfd from where I am to the main CPU
 interrupt controller? or something similar.)

 Well, same here: then define more precise generic test functions.

First I have to understand what the current code is trying to do;
your explanations have been very helpful in that respect.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] Recognize PCID feature

2012-07-21 Thread Stefan Hajnoczi
On Fri, Jul 20, 2012 at 07:08:21AM +, Mao, Junjie wrote:
 This patch makes Qemu recognize the PCID feature specified from configuration 
 or command line options.
 
 Signed-off-by: Junjie Mao junjie@intel.com
 ---
  target-i386/cpu.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

Thanks, applied to the trivial patches tree:
https://github.com/stefanha/qemu/commits/trivial-patches

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Jan Kiszka
On 2012-07-21 11:30, Peter Maydell wrote:
 On 21 July 2012 10:14, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 10:54, Peter Maydell wrote:
 On 21 July 2012 07:57, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-20 21:14, Peter Maydell wrote:
 I'm sure this isn't the only x86ism in the KVM generic source
 files. However the thing I'm specifically trying to do is
 nuke all the uses of kvm_irqchip_in_kernel() in common code,

 No, irqchip in kernel is supposed to be a generic concept. We will
 also have it on Power. Not sure what your plans are for ARM, maybe it
 will always be true there.

 I agree that irqchip in kernel? is generic (though as you'll see
 below there's disagreement about what that ought to mean or imply).
 irq0_override though seems to me to be absolutely x86 specific.

 Naming is x86 specific, semantic not. It means that KVM doesn't prevent
 remapping of IRQs. Granted, I really hope you don't make such mistakes
 in your arch.
 
 What does remapping of IRQs mean here? This is still sounding

It means that the QEMU model of the board can define interrupt routes in
an unconfined way, which is obviously always true when the irqchips are
all in userspace but not necessarily when KVM support is in the loop.

 not very generic to me, in that I really don't know what it would
 mean in an ARM context. The fact that the only caller of this is
 in hw/pc.c is also a big red flag that this isn't exactly generic.

x86 is also still the only arch with full in-kernel irqchip support. And
even if there is only one arch using it, that doesn't mean the test
needs to be moved around - if the test itself is generic, just always
true for other archs.

 
 That said, maybe there is room for discussion about what it means for
 the general KVM code and its users if the irqchip is in the kernel. Two
 things that should be common for every arch:
  - VCPU idle management is done inside the kernel

 The trouble is that at the moment QEMU assumes that is the
 irqchip in kernel? == is VCPU idle management in kernel, for
 instance. For ARM, VCPU idle management is in kernel whether
 we're using the kernel's model of the VGIC or not. Alex tells
 me PPC is the same way. It's only x86 that has tied these two
 concepts together.

 Hmm, and why does Power work despite this mismatch?
 
 I think because hw/ppc.c:ppc_set_irq() both calls cpu_interrupt()
 and also kvmppc_set_interrupt(), so we end up with a sort of odd
 mix of both models... Alex?
 
 If cpu_thread_is_idle doesn't work for you, define something like
 kvm_idle_in_kernel() to replace kvm_irqchip_in_kernel here.
 
 Yes, this is kind of where I'm headed. I thought I'd start with this
 patch as the easiest one first, though...

Before moving anything, let's refine/break up the semantics of
kvm_irqchip_in_kernel first.

 
 The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel()
 is because I think they're all similar to this -- the common code is
 using the check as a proxy for something else, and it should be directly
 asking about that something else. The only bits of code that should
 care about is the irqchip in kernel? are:
  * target-specific device/machine setup code which needs to know
which apic/etc to instantiate
  * target-specific x86 code which has this weird synchronous IRQ
delivery model for irqchip-not-in-kernel
 (Obviously I might have missed something, I'm flailing around
 trying to understand this code :-))

  - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly

 The latter point implies that irqfd is available and that interrupt
 routes from virtual IRQs (*) (like the one associated with an irqfd) to
 the in-kernel IRQ controller have to be established. That's pretty generic.

 But you can perfectly well have an in-kernel-irqchip that doesn't
 support irqfd

 You could, thought this doesn't make much sense.
 
 Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take
 advantage of the hardware support for a virtual GIC, and you can use
 the virtual timer support too. These are both big performance advantages
 even if QEMU never does anything with irqfds. (In fact the current
 ARM KVM VGIC code doesn't support irqfds as far as I can see from
 a quick scan of the kernel code.)

It doesn't make sense as it means your in-kernel irqchip model is
semi-finished. If you didn't consider how to support direct in-kernel
IRQ injections, you risk designing something that requires userspace
quirk handling later on when extending it to full-featured in-kernel
irqchip support. Of course, you are free to do this stepwise, but your
digging through QEMU /wrt x86-specifics in the KVM layer should warn you
about the risks. ;)

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Peter Maydell
On 21 July 2012 10:44, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 11:30, Peter Maydell wrote:
 On 21 July 2012 10:14, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 10:54, Peter Maydell wrote:
 On 21 July 2012 07:57, Jan Kiszka jan.kis...@web.de wrote:
 Naming is x86 specific, semantic not. It means that KVM doesn't prevent
 remapping of IRQs. Granted, I really hope you don't make such mistakes
 in your arch.

 What does remapping of IRQs mean here?

 It means that the QEMU model of the board can define interrupt routes in
 an unconfined way, which is obviously always true when the irqchips are
 all in userspace but not necessarily when KVM support is in the loop.

Er, surely it's only false if your KVM irqchip is obviously broken?
I mean, any sane in-kernel-irqchip model needs to support the same set
of incoming irq lines as the out-of-kernel irqchip model it's replacing.
There's no difference in flexibility there.

Or are you trying to talk about defining interrupt routes when the
source and destination are both kernel code but the route needs to
be set by userspace (ie is machine specific not cpu specific)?
Whether that's possible sounds to me like it would depend on all
the board model code between the source and destination rather
than being a single global boolean check.

 not very generic to me, in that I really don't know what it would
 mean in an ARM context. The fact that the only caller of this is
 in hw/pc.c is also a big red flag that this isn't exactly generic.

 x86 is also still the only arch with full in-kernel irqchip support. And
 even if there is only one arch using it, that doesn't mean the test
 needs to be moved around - if the test itself is generic, just always
 true for other archs.

I just don't see why this check is anything other than do I have
a broken x86 kernel irqchip? In particular it doesn't have any
documented semantics defined in generic terms that you could usefully
use in general QEMU code to say do I need to call this function
and what should I be doing based on the result?

  - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly

 The latter point implies that irqfd is available and that interrupt
 routes from virtual IRQs (*) (like the one associated with an irqfd) to
 the in-kernel IRQ controller have to be established. That's pretty 
 generic.

 But you can perfectly well have an in-kernel-irqchip that doesn't
 support irqfd

 You could, thought this doesn't make much sense.

 Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take
 advantage of the hardware support for a virtual GIC, and you can use
 the virtual timer support too. These are both big performance advantages
 even if QEMU never does anything with irqfds. (In fact the current
 ARM KVM VGIC code doesn't support irqfds as far as I can see from
 a quick scan of the kernel code.)

 It doesn't make sense as it means your in-kernel irqchip model is
 semi-finished. If you didn't consider how to support direct in-kernel
 IRQ injections, you risk designing something that requires userspace
 quirk handling later on when extending it to full-featured in-kernel
 irqchip support.

Well, the in-kernel virtual timer already does direct in-kernel IRQ
injection to the VGIC: it calls a function to say inject IRQ X...
(this works because the interrupt line used is fixed by the CPU,
it's not a board model property so there is no need for the routing
to be defined by userspace. (ie both ends of this irq injection are
in the CPU proper.))

As far as I can tell you seem to be saying irqfds are an extra
feature you might want later, which is exactly you can have
an irqchip without them.

(Is there some summary of the design of the irqfds stuff somewhere I
can go and read up?)

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Jan Kiszka
On 2012-07-21 11:56, Peter Maydell wrote:
 On 21 July 2012 10:44, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 11:30, Peter Maydell wrote:
 On 21 July 2012 10:14, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 10:54, Peter Maydell wrote:
 On 21 July 2012 07:57, Jan Kiszka jan.kis...@web.de wrote:
 Naming is x86 specific, semantic not. It means that KVM doesn't prevent
 remapping of IRQs. Granted, I really hope you don't make such mistakes
 in your arch.

 What does remapping of IRQs mean here?

 It means that the QEMU model of the board can define interrupt routes in
 an unconfined way, which is obviously always true when the irqchips are
 all in userspace but not necessarily when KVM support is in the loop.
 
 Er, surely it's only false if your KVM irqchip is obviously broken?
 I mean, any sane in-kernel-irqchip model needs to support the same set
 of incoming irq lines as the out-of-kernel irqchip model it's replacing.
 There's no difference in flexibility there.
 
 Or are you trying to talk about defining interrupt routes when the
 source and destination are both kernel code but the route needs to
 be set by userspace (ie is machine specific not cpu specific)?

It describes this requirement primarily.

 Whether that's possible sounds to me like it would depend on all
 the board model code between the source and destination rather
 than being a single global boolean check.

It depends on the feature set of the in-kernel irqchips and if this can
possibly vary on real hw.

 
 not very generic to me, in that I really don't know what it would
 mean in an ARM context. The fact that the only caller of this is
 in hw/pc.c is also a big red flag that this isn't exactly generic.

 x86 is also still the only arch with full in-kernel irqchip support. And
 even if there is only one arch using it, that doesn't mean the test
 needs to be moved around - if the test itself is generic, just always
 true for other archs.
 
 I just don't see why this check is anything other than do I have
 a broken x86 kernel irqchip? In particular it doesn't have any
 documented semantics defined in generic terms that you could usefully
 use in general QEMU code to say do I need to call this function
 and what should I be doing based on the result?

Let's look at this again when we redefined the KVM test functions. Maybe
then the infrastructure is available to move it cleanly. Maybe it will
continue to not matter and we can leave it alone.

 
  - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly

 The latter point implies that irqfd is available and that interrupt
 routes from virtual IRQs (*) (like the one associated with an irqfd) to
 the in-kernel IRQ controller have to be established. That's pretty 
 generic.

 But you can perfectly well have an in-kernel-irqchip that doesn't
 support irqfd

 You could, thought this doesn't make much sense.

 Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take
 advantage of the hardware support for a virtual GIC, and you can use
 the virtual timer support too. These are both big performance advantages
 even if QEMU never does anything with irqfds. (In fact the current
 ARM KVM VGIC code doesn't support irqfds as far as I can see from
 a quick scan of the kernel code.)

 It doesn't make sense as it means your in-kernel irqchip model is
 semi-finished. If you didn't consider how to support direct in-kernel
 IRQ injections, you risk designing something that requires userspace
 quirk handling later on when extending it to full-featured in-kernel
 irqchip support.
 
 Well, the in-kernel virtual timer already does direct in-kernel IRQ
 injection to the VGIC: it calls a function to say inject IRQ X...
 (this works because the interrupt line used is fixed by the CPU,
 it's not a board model property so there is no need for the routing
 to be defined by userspace. (ie both ends of this irq injection are
 in the CPU proper.))

Could you inject IRQs from an in-kernel helper that (partially or fully)
emulates some device sitting on peripheral buses as well (like PCI)? If
not, you aren't done with the in-kernel irqchip model yet.

 
 As far as I can tell you seem to be saying irqfds are an extra
 feature you might want later, which is exactly you can have
 an irqchip without them.

Once the prerequisites for peripheral interrupt injections are there,
enabling irqfd for your arch should be straightforward. I'm picking on
those prerequisites here, not irqfd.

 
 (Is there some summary of the design of the irqfds stuff somewhere I
 can go and read up?)

linux/Documentation/virtual/kvm/api.txt is a good start, though not
really a high-level summary.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v7] kvm: notify host when the guest is panicked

2012-07-21 Thread Sasha Levin
On 07/21/2012 09:12 AM, Wen Congyang wrote:
 +#define KVM_PV_PORT  (0x505UL)
 +
  #ifdef __KERNEL__
  #include asm/processor.h
  
 @@ -221,6 +223,11 @@ static inline void kvm_disable_steal_time(void)
  }
  #endif
  
 +static inline unsigned int kvm_arch_pv_features(void)
 +{
 + return inl(KVM_PV_PORT);
 +}
 +

Why is this safe?

I'm not sure you can just pick any ioport you'd like and use it.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Peter Maydell
On 21 July 2012 11:22, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 11:56, Peter Maydell wrote:
 Or are you trying to talk about defining interrupt routes when the
 source and destination are both kernel code but the route needs to
 be set by userspace (ie is machine specific not cpu specific)?

 It describes this requirement primarily.

 Whether that's possible sounds to me like it would depend on all
 the board model code between the source and destination rather
 than being a single global boolean check.

 It depends on the feature set of the in-kernel irqchips and if this can
 possibly vary on real hw.

If the interrupt route is on-CPU then its routing is fixed (for
that CPU), and you don't need to care about irqfds because the
kernel knows what CPU it's providing to the guest, has both ends
of the connection and can just do the right thing however is most
convenient for the internal implementation. If the interrupt route
is off-CPU then all bets are off because the routing is machine
specific and could go through any kind of logic between the peripheral
and the CPU's irqchip.

I don't see how you can do this with QEMU's current IRQ infrastructure,
which basically just hardwires everything with qemu_irq lines and
doesn't provide any way to query the routing and logic from an
irq source to its destination.

 But you can perfectly well have an in-kernel-irqchip that doesn't
 support irqfd

 You could, thought this doesn't make much sense.

 Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take
 advantage of the hardware support for a virtual GIC, and you can use
 the virtual timer support too. These are both big performance advantages
 even if QEMU never does anything with irqfds. (In fact the current
 ARM KVM VGIC code doesn't support irqfds as far as I can see from
 a quick scan of the kernel code.)

 It doesn't make sense as it means your in-kernel irqchip model is
 semi-finished. If you didn't consider how to support direct in-kernel
 IRQ injections, you risk designing something that requires userspace
 quirk handling later on when extending it to full-featured in-kernel
 irqchip support.

 Well, the in-kernel virtual timer already does direct in-kernel IRQ
 injection to the VGIC: it calls a function to say inject IRQ X...
 (this works because the interrupt line used is fixed by the CPU,
 it's not a board model property so there is no need for the routing
 to be defined by userspace. (ie both ends of this irq injection are
 in the CPU proper.))

 Could you inject IRQs from an in-kernel helper that (partially or fully)
 emulates some device sitting on peripheral buses as well (like PCI)? If
 not, you aren't done with the in-kernel irqchip model yet.

This is still sounding like there is an extra feature which you should
probably implement at some point and should certainly design with the
intention of supporting, not you cannot have an irqchip without irqfds.

Therefore QEMU code which cares about irqfds should be doing
checks for irqfd functionality, not is there an in kernel
irqchip.

 As far as I can tell you seem to be saying irqfds are an extra
 feature you might want later, which is exactly you can have
 an irqchip without them.

 Once the prerequisites for peripheral interrupt injections are there,
 enabling irqfd for your arch should be straightforward. I'm picking on
 those prerequisites here, not irqfd.


 (Is there some summary of the design of the irqfds stuff somewhere I
 can go and read up?)

 linux/Documentation/virtual/kvm/api.txt is a good start, though not
 really a high-level summary.

I looked for 'irqfd' in that and found a straightforward ioctl for
wire this eventfd up to this irqchip input. That doesn't say anything
about remapping of IRQs, and it's not clear to me either why a
straightforward use an ioctl to deliver incoming interrupts design
would be broken by adding that later: it's just a different source
for the interrupt...

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Jan Kiszka
On 2012-07-21 12:53, Peter Maydell wrote:
 On 21 July 2012 11:22, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 11:56, Peter Maydell wrote:
 Or are you trying to talk about defining interrupt routes when the
 source and destination are both kernel code but the route needs to
 be set by userspace (ie is machine specific not cpu specific)?

 It describes this requirement primarily.

 Whether that's possible sounds to me like it would depend on all
 the board model code between the source and destination rather
 than being a single global boolean check.

 It depends on the feature set of the in-kernel irqchips and if this can
 possibly vary on real hw.
 
 If the interrupt route is on-CPU then its routing is fixed (for
 that CPU), and you don't need to care about irqfds because the
 kernel knows what CPU it's providing to the guest, has both ends
 of the connection and can just do the right thing however is most
 convenient for the internal implementation. If the interrupt route
 is off-CPU then all bets are off because the routing is machine
 specific and could go through any kind of logic between the peripheral
 and the CPU's irqchip.
 
 I don't see how you can do this with QEMU's current IRQ infrastructure,
 which basically just hardwires everything with qemu_irq lines and
 doesn't provide any way to query the routing and logic from an
 irq source to its destination.

Routing from arbitrary sources to the in-kernel sink, skipping
intermediate steps in the hotpath is in fact an unsolved issue in QEMU.
We are just introducing band-aid for PCI on x86 (to introduce PCI device
assignment). Long-term, this requires a generic solution which allows
path discovery etc. But this is a userspace problem, nothing related to
the KVM kernel features.

 
 But you can perfectly well have an in-kernel-irqchip that doesn't
 support irqfd

 You could, thought this doesn't make much sense.

 Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take
 advantage of the hardware support for a virtual GIC, and you can use
 the virtual timer support too. These are both big performance advantages
 even if QEMU never does anything with irqfds. (In fact the current
 ARM KVM VGIC code doesn't support irqfds as far as I can see from
 a quick scan of the kernel code.)

 It doesn't make sense as it means your in-kernel irqchip model is
 semi-finished. If you didn't consider how to support direct in-kernel
 IRQ injections, you risk designing something that requires userspace
 quirk handling later on when extending it to full-featured in-kernel
 irqchip support.

 Well, the in-kernel virtual timer already does direct in-kernel IRQ
 injection to the VGIC: it calls a function to say inject IRQ X...
 (this works because the interrupt line used is fixed by the CPU,
 it's not a board model property so there is no need for the routing
 to be defined by userspace. (ie both ends of this irq injection are
 in the CPU proper.))

 Could you inject IRQs from an in-kernel helper that (partially or fully)
 emulates some device sitting on peripheral buses as well (like PCI)? If
 not, you aren't done with the in-kernel irqchip model yet.
 
 This is still sounding like there is an extra feature which you should
 probably implement at some point and should certainly design with the
 intention of supporting, not you cannot have an irqchip without irqfds.
 
 Therefore QEMU code which cares about irqfds should be doing
 checks for irqfd functionality, not is there an in kernel
 irqchip.

Defining some kvm_irqfd_available() is one thing. Ignoring irqfd for
now while introducing in-kernel irqchip is another, less wise decision.

 As far as I can tell you seem to be saying irqfds are an extra
 feature you might want later, which is exactly you can have
 an irqchip without them.

 Once the prerequisites for peripheral interrupt injections are there,
 enabling irqfd for your arch should be straightforward. I'm picking on
 those prerequisites here, not irqfd.


 (Is there some summary of the design of the irqfds stuff somewhere I
 can go and read up?)

 linux/Documentation/virtual/kvm/api.txt is a good start, though not
 really a high-level summary.
 
 I looked for 'irqfd' in that and found a straightforward ioctl for
 wire this eventfd up to this irqchip input. That doesn't say anything
 about remapping of IRQs, and it's not clear to me either why a
 straightforward use an ioctl to deliver incoming interrupts design
 would be broken by adding that later: it's just a different source
 for the interrupt...

Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE),
adding irqfd is in fact simple.

Jan



signature.asc
Description: OpenPGP digital signature


Re: qemu fails to build with glibc-2.15

2012-07-21 Thread X O
On Mon, Jul 16, 2012 at 2:37 PM, X O bex...@gmail.com wrote:
 Hello,

 I suspect upgrading my system to glibc-2.15 was a mistake. It seems to
 be qemu-1.0.1, and latter versions including qemu-1.1.1, can't be
 compiled anymore. Yes, I did search around and that led me to glibc,
 resp. http://sourceware.org/ml/libc-ports/2011-08/msg00019.html

 Please, could somebody confirm or deny the following error is thanks
 to glibc-2.15?
 Is there a way to compile qemu with glibc-2.15?
 Or is my system broken?

 ~~~
   CCi386-linux-user/syscall.o
 /tmp/qemu-1.1.1/linux-user/syscall.c: In function 'sys_getdents':
 /tmp/qemu-1.1.1/linux-user/syscall.c:217:1: error: '__NR_getdents'
 undeclared (first use in this function)
 /tmp/qemu-1.1.1/linux-user/syscall.c:217:1: note: each undeclared
 identifier is reported only once for each function it appears in
 /tmp/qemu-1.1.1/linux-user/syscall.c: In function '_llseek':
 /tmp/qemu-1.1.1/linux-user/syscall.c:224:1: error: '__NR_lseek'
 undeclared (first use in this function)
 /tmp/qemu-1.1.1/linux-user/syscall.c: At top level:
 /tmp/qemu-1.1.1/linux-user/syscall.c:358:12: warning: 'sys_futimesat'
 defined but not used [-Wunused-function]
 /tmp/qemu-1.1.1/linux-user/syscall.c: In function '_llseek':
 /tmp/qemu-1.1.1/linux-user/syscall.c:223:1: warning: control reaches
 end of non-void function [-Wreturn-type]
 /tmp/qemu-1.1.1/linux-user/syscall.c: In function 'sys_getdents':
 /tmp/qemu-1.1.1/linux-user/syscall.c:217:1: warning: control reaches
 end of non-void function [-Wreturn-type]
 ~~~

 Thanks,
 bexoff

I meant to sent this out as soon as I found out, but it seems I didn't.

Yeah, my system was broken, resp. /usr/include/ was. As soon as I had
qemu back up, I put some time and effort to find out why.
Conclusion is - qemu-1.1.1 can be compiled just fine with glibc-2.15
and on x86_64.

Sorry for inconvenience and thanks for all help.

--
bexoff
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC V5 4/5] virtio_net: multiqueue support

2012-07-21 Thread Sasha Levin
On 07/20/2012 03:40 PM, Michael S. Tsirkin wrote:
 -err = init_vqs(vi);
  +  if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
  +  vi-has_cvq = true;
  +
 How about we disable multiqueue if there's no cvq?
 Will make logic a bit simpler, won't it?

multiqueues don't really depend on cvq. Does this added complexity really 
justifies adding an artificial limit?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Jan Kiszka
On 2012-07-21 14:17, Peter Maydell wrote:
 On 21 July 2012 12:08, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 12:53, Peter Maydell wrote:
 This is still sounding like there is an extra feature which you should
 probably implement at some point and should certainly design with the
 intention of supporting, not you cannot have an irqchip without irqfds.

 Therefore QEMU code which cares about irqfds should be doing
 checks for irqfd functionality, not is there an in kernel
 irqchip.

 Defining some kvm_irqfd_available() is one thing. Ignoring irqfd for
 now while introducing in-kernel irqchip is another, less wise decision.
 
 You still haven't really explained why we can't just ignore irqfd
 for now. I don't see how it would particularly affect the design
 of the kernel implementation very much, and the userspace interface
 seems to just be an extra ioctl.

I bet you ignored MSI so far. Physical IRQ lines are just a part of the
whole picture. How are MSIs delivered on the systems you want to add?

 
 Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE),
 adding irqfd is in fact simple.
 
 I don't really understand where KVM_SET_GSI_ROUTING comes into
 this -- the documentation is a bit opaque. It says Sets the GSI
 routing table entries but it doesn't define what a GSI is or
 what we're routing to where. Googling suggests GSI is an APIC
 specific term so it doesn't sound like it's relevant for non-x86.

As I said before: GSI needs to be read as physical or virtual IRQ
line. The virtual ones can be of any source you define, irqfd is just one.

 
 I'm not sure what KVM_IRQ_LINE has to do with it either -- this
 is just the standard interrupt injection ioctl. KVM ARM supports
 this whether there's an in-kernel irqchip or not.

KVM_IRQ_LINE injects according to the routing defined via
KVM_SET_GSI_ROUTING, at least on x86 (and ia64). If you define your own
KVM_IRQ_LINE semantics, you may have problems later on when you need to
redefine it after adding IRQ routing support.

Even if you don't want to add full irqchip support now, plan for it.
Define the so far used interfaces in a way that they will work
seamlessly when adding the missing bits and pieces. However, I still
think it's better to skip the intermediate steps in order to avoid
planning mistakes.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Peter Maydell
On 21 July 2012 13:35, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 14:17, Peter Maydell wrote:
 You still haven't really explained why we can't just ignore irqfd
 for now. I don't see how it would particularly affect the design
 of the kernel implementation very much, and the userspace interface
 seems to just be an extra ioctl.

 I bet you ignored MSI so far. Physical IRQ lines are just a part of the
 whole picture. How are MSIs delivered on the systems you want to add?

You're using random acronyms without defining them again. It looks
as if MSI is a PCI specific term. That would seem to me to fall
under the heading of routing across a board model which we can't
do anyway, because you have no idea how this all wired up, it
will depend on the details of the SoC and the PCI controller.
(As it happens the initial board model doesn't have PCI support;
most ARM boards don't.) I'm not entirely sure we want to have
in kernel random SoC-specific PCI controller...

[Point taken that thought is required here, though.]

 Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE),
 adding irqfd is in fact simple.

 I don't really understand where KVM_SET_GSI_ROUTING comes into
 this -- the documentation is a bit opaque. It says Sets the GSI
 routing table entries but it doesn't define what a GSI is or
 what we're routing to where. Googling suggests GSI is an APIC
 specific term so it doesn't sound like it's relevant for non-x86.

 As I said before: GSI needs to be read as physical or virtual IRQ
 line. The virtual ones can be of any source you define, irqfd is just one.

What's a virtual irq line in this context? We're modelling a physical
bit of hardware which has N interrupt lines, so I'm not sure what
a virtual irq line would be or how it would appear to the guest...

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386

2012-07-21 Thread Jan Kiszka
On 2012-07-21 14:57, Peter Maydell wrote:
 On 21 July 2012 13:35, Jan Kiszka jan.kis...@web.de wrote:
 On 2012-07-21 14:17, Peter Maydell wrote:
 You still haven't really explained why we can't just ignore irqfd
 for now. I don't see how it would particularly affect the design
 of the kernel implementation very much, and the userspace interface
 seems to just be an extra ioctl.

 I bet you ignored MSI so far. Physical IRQ lines are just a part of the
 whole picture. How are MSIs delivered on the systems you want to add?
 
 You're using random acronyms without defining them again. It looks
 as if MSI is a PCI specific term. That would seem to me to fall
 under the heading of routing across a board model which we can't
 do anyway, because you have no idea how this all wired up, it
 will depend on the details of the SoC and the PCI controller.

For sure you can. You need to discover those wiring, cache it, and then
let the source inject to the final destination directly. See the INTx
routing notifier and pci_device_route_intx_to_irq from [1] for that
simplistic approach we are taking on the x86/PC architecture.

 (As it happens the initial board model doesn't have PCI support;
 most ARM boards don't.) I'm not entirely sure we want to have
 in kernel random SoC-specific PCI controller...

Isn't ARM going after server scenarios as well? Will be hard without
some PCI support. The good news is that you likely won't need a full
in-kernel PCI model for this (we don't do so on x86 as well).

 
 [Point taken that thought is required here, though.]
 
 Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE),
 adding irqfd is in fact simple.

 I don't really understand where KVM_SET_GSI_ROUTING comes into
 this -- the documentation is a bit opaque. It says Sets the GSI
 routing table entries but it doesn't define what a GSI is or
 what we're routing to where. Googling suggests GSI is an APIC
 specific term so it doesn't sound like it's relevant for non-x86.

 As I said before: GSI needs to be read as physical or virtual IRQ
 line. The virtual ones can be of any source you define, irqfd is just one.
 
 What's a virtual irq line in this context? We're modelling a physical
 bit of hardware which has N interrupt lines, so I'm not sure what
 a virtual irq line would be or how it would appear to the guest...

A virtual line is an input of the in-kernel IRQ router you configure via
SET_GSI_ROUTING. A physical line is a potential output of it that goes
into some in-kernel interrupt controller model. It can also be an
interrupt message sent to a specific CPU - provided the arch supports
such a delivery protocol.

Of course, the current router was modeled after x86 and ia64. So I
wouldn't be surprised if some ARM system configuration cannot be
expressed this way. Then we need to discuss reasonable extensions. But
it should provide a sound foundation at least.

Jan

[1] http://permalink.gmane.org/gmane.comp.emulators.qemu/160792



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation

2012-07-21 Thread Andy Walls
On Thu, 2012-07-19 at 14:15 -0700, Tejun Heo wrote:
 From c9bba34243a86fb3ac82d1bdd0ce4bf796b79559 Mon Sep 17 00:00:00 2001
 From: Tejun Heo t...@kernel.org
 Date: Thu, 19 Jul 2012 13:52:53 -0700
 
 Make the following two non-functional changes.
 
 * Separate out insert_kthread_work() from queue_kthread_work().
 
 * Relocate struct kthread_flush_work and kthread_flush_work_fn()
   definitions above flush_kthread_work().
 
 Signed-off-by: Tejun Heo t...@kernel.org
 ---
  kernel/kthread.c |   40 
  1 files changed, 24 insertions(+), 16 deletions(-)
 
 diff --git a/kernel/kthread.c b/kernel/kthread.c
 index 3d3de63..7b8a678 100644
 --- a/kernel/kthread.c
 +++ b/kernel/kthread.c
 @@ -378,6 +378,17 @@ repeat:
  }
  EXPORT_SYMBOL_GPL(kthread_worker_fn);
  
 +/* insert @work before @pos in @worker */

Hi Tejun,

Would a comment that the caller should be holding worker-lock be useful
here?  Anyway, comment or not:

Acked-by: Andy Walls aw...@md.metrocast.net

Regards,
Andy

 +static void insert_kthread_work(struct kthread_worker *worker,
 +struct kthread_work *work,
 +struct list_head *pos)
 +{
 + list_add_tail(work-node, pos);
 + work-queue_seq++;
 + if (likely(worker-task))
 + wake_up_process(worker-task);
 +}
 +
  /**
   * queue_kthread_work - queue a kthread_work
   * @worker: target kthread_worker
 @@ -395,10 +406,7 @@ bool queue_kthread_work(struct kthread_worker *worker,
  
   spin_lock_irqsave(worker-lock, flags);
   if (list_empty(work-node)) {
 - list_add_tail(work-node, worker-work_list);
 - work-queue_seq++;
 - if (likely(worker-task))
 - wake_up_process(worker-task);
 + insert_kthread_work(worker, work, worker-work_list);
   ret = true;
   }
   spin_unlock_irqrestore(worker-lock, flags);
 @@ -406,6 +414,18 @@ bool queue_kthread_work(struct kthread_worker *worker,
  }
  EXPORT_SYMBOL_GPL(queue_kthread_work);
  
 +struct kthread_flush_work {
 + struct kthread_work work;
 + struct completion   done;
 +};
 +
 +static void kthread_flush_work_fn(struct kthread_work *work)
 +{
 + struct kthread_flush_work *fwork =
 + container_of(work, struct kthread_flush_work, work);
 + complete(fwork-done);
 +}
 +
  /**
   * flush_kthread_work - flush a kthread_work
   * @work: work to flush
 @@ -436,18 +456,6 @@ void flush_kthread_work(struct kthread_work *work)
  }
  EXPORT_SYMBOL_GPL(flush_kthread_work);
  
 -struct kthread_flush_work {
 - struct kthread_work work;
 - struct completion   done;
 -};
 -
 -static void kthread_flush_work_fn(struct kthread_work *work)
 -{
 - struct kthread_flush_work *fwork =
 - container_of(work, struct kthread_flush_work, work);
 - complete(fwork-done);
 -}
 -
  /**
   * flush_kthread_worker - flush all current works on a kthread_worker
   * @worker: worker to flush


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed

2012-07-21 Thread Andy Walls
On Thu, 2012-07-19 at 14:16 -0700, Tejun Heo wrote:
 From 06f9a06f4aeecdb9d07014713ab41b548ae219b5 Mon Sep 17 00:00:00 2001
 From: Tejun Heo t...@kernel.org
 Date: Thu, 19 Jul 2012 13:52:53 -0700
 
 kthread_worker provides minimalistic workqueue-like interface for
 users which need a dedicated worker thread (e.g. for realtime
 priority).  It has basic queue, flush_work, flush_worker operations
 which mostly match the workqueue counterparts; however, due to the way
 flush_work() is implemented, it has a noticeable difference of not
 allowing work items to be freed while being executed.
 
 While the current users of kthread_worker are okay with the current
 behavior, the restriction does impede some valid use cases.  Also,
 removing this difference isn't difficult and actually makes the code
 easier to understand.
 
 This patch reimplements flush_kthread_work() such that it uses a
 flush_work item instead of queue/done sequence numbers.
 
 Signed-off-by: Tejun Heo t...@kernel.org
 ---
  include/linux/kthread.h |8 +-
  kernel/kthread.c|   48 ++
  2 files changed, 29 insertions(+), 27 deletions(-)

Hi Tejun,

I have a question and comment below.

 diff --git a/include/linux/kthread.h b/include/linux/kthread.h
 index 0714b24..22ccf9d 100644
 --- a/include/linux/kthread.h
 +++ b/include/linux/kthread.h
 @@ -49,8 +49,6 @@ extern int tsk_fork_get_node(struct task_struct *tsk);
   * can be queued and flushed using queue/flush_kthread_work()
   * respectively.  Queued kthread_works are processed by a kthread
   * running kthread_worker_fn().
 - *
 - * A kthread_work can't be freed while it is executing.
   */
  struct kthread_work;
  typedef void (*kthread_work_func_t)(struct kthread_work *work);
 @@ -59,15 +57,14 @@ struct kthread_worker {
   spinlock_t  lock;
   struct list_headwork_list;
   struct task_struct  *task;
 + struct kthread_work *current_work;
  };
  
  struct kthread_work {
   struct list_headnode;
   kthread_work_func_t func;
   wait_queue_head_t   done;
 - atomic_tflushing;
 - int queue_seq;
 - int done_seq;
 + struct kthread_worker   *worker;
  };
  
  #define KTHREAD_WORKER_INIT(worker)  {   \
 @@ -79,7 +76,6 @@ struct kthread_work {
   .node = LIST_HEAD_INIT((work).node),\
   .func = (fn),   \
   .done = __WAIT_QUEUE_HEAD_INITIALIZER((work).done), \
 - .flushing = ATOMIC_INIT(0), \
   }
  
  #define DEFINE_KTHREAD_WORKER(worker)
 \
 diff --git a/kernel/kthread.c b/kernel/kthread.c
 index 7b8a678..4034b2b 100644
 --- a/kernel/kthread.c
 +++ b/kernel/kthread.c
 @@ -360,16 +360,12 @@ repeat:
   struct kthread_work, node);
   list_del_init(work-node);
   }
 + worker-current_work = work;
   spin_unlock_irq(worker-lock);
  
   if (work) {
   __set_current_state(TASK_RUNNING);
   work-func(work);

If the call to 'work-func(work);' frees the memory pointed to by
'work', 'worker-current_work' points to deallocated memory.
So 'worker-current_work' will only ever used as a unique 'work'
identifier to handle, correct?


 - smp_wmb();  /* wmb worker-b0 paired with flush-b1 */
 - work-done_seq = work-queue_seq;
 - smp_mb();   /* mb worker-b1 paired with flush-b0 */
 - if (atomic_read(work-flushing))
 - wake_up_all(work-done);
   } else if (!freezing(current))
   schedule();
  
 @@ -384,7 +380,7 @@ static void insert_kthread_work(struct kthread_worker 
 *worker,
  struct list_head *pos)
  {
   list_add_tail(work-node, pos);
 - work-queue_seq++;
 + work-worker = worker;
   if (likely(worker-task))
   wake_up_process(worker-task);
  }
 @@ -434,25 +430,35 @@ static void kthread_flush_work_fn(struct kthread_work 
 *work)
   */
  void flush_kthread_work(struct kthread_work *work)
  {
 - int seq = work-queue_seq;
 + struct kthread_flush_work fwork = {
 + KTHREAD_WORK_INIT(fwork.work, kthread_flush_work_fn),
 + COMPLETION_INITIALIZER_ONSTACK(fwork.done),
 + };
 + struct kthread_worker *worker;
 + bool noop = false;
 +

You might want a check for 'work == NULL' here, to gracefully handle
code like the following:

void driver_work_handler(struct kthread_work *work)
{
...
kfree(work);
}

struct kthread_work *driver_queue_batch(void)
{
struct kthread_work *work = NULL;
...
while (driver_more_stuff_todo()) {
work = kzalloc(sizeof(struct kthread work), GFP_WHATEVER);

Re: [RFC-v4 1/3] vhost: Separate vhost-net features from vhost features

2012-07-21 Thread Michael S. Tsirkin
On Sat, Jul 21, 2012 at 06:55:36AM +, Nicholas A. Bellinger wrote:
 From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 
 In order for other vhost devices to use the VHOST_FEATURES bits the
 vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES
 constant.
 
 (Asias: Update drivers/vhost/test.c to use VHOST_NET_FEATURES)
 
 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 Cc: Zhi Yong Wu wu...@cn.ibm.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Asias He as...@redhat.com
 Signed-off-by: Nicholas A. Bellinger n...@risingtidesystems.com

Applied to vhost-next.

 ---
  drivers/vhost/net.c   |4 ++--
  drivers/vhost/test.c  |4 ++--
  drivers/vhost/vhost.h |3 ++-
  3 files changed, 6 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index f82a739..072cbba 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -823,14 +823,14 @@ static long vhost_net_ioctl(struct file *f, unsigned 
 int ioctl,
   return -EFAULT;
   return vhost_net_set_backend(n, backend.index, backend.fd);
   case VHOST_GET_FEATURES:
 - features = VHOST_FEATURES;
 + features = VHOST_NET_FEATURES;
   if (copy_to_user(featurep, features, sizeof features))
   return -EFAULT;
   return 0;
   case VHOST_SET_FEATURES:
   if (copy_from_user(features, featurep, sizeof features))
   return -EFAULT;
 - if (features  ~VHOST_FEATURES)
 + if (features  ~VHOST_NET_FEATURES)
   return -EOPNOTSUPP;
   return vhost_net_set_features(n, features);
   case VHOST_RESET_OWNER:
 diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
 index 3de00d9..91d6f06 100644
 --- a/drivers/vhost/test.c
 +++ b/drivers/vhost/test.c
 @@ -261,14 +261,14 @@ static long vhost_test_ioctl(struct file *f, unsigned 
 int ioctl,
   return -EFAULT;
   return vhost_test_run(n, test);
   case VHOST_GET_FEATURES:
 - features = VHOST_FEATURES;
 + features = VHOST_NET_FEATURES;
   if (copy_to_user(featurep, features, sizeof features))
   return -EFAULT;
   return 0;
   case VHOST_SET_FEATURES:
   if (copy_from_user(features, featurep, sizeof features))
   return -EFAULT;
 - if (features  ~VHOST_FEATURES)
 + if (features  ~VHOST_NET_FEATURES)
   return -EOPNOTSUPP;
   return vhost_test_set_features(n, features);
   case VHOST_RESET_OWNER:
 diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
 index 8de1fd5..07b9763 100644
 --- a/drivers/vhost/vhost.h
 +++ b/drivers/vhost/vhost.h
 @@ -201,7 +201,8 @@ enum {
   VHOST_FEATURES = (1ULL  VIRTIO_F_NOTIFY_ON_EMPTY) |
(1ULL  VIRTIO_RING_F_INDIRECT_DESC) |
(1ULL  VIRTIO_RING_F_EVENT_IDX) |
 -  (1ULL  VHOST_F_LOG_ALL) |
 +  (1ULL  VHOST_F_LOG_ALL),
 + VHOST_NET_FEATURES = VHOST_FEATURES |
(1ULL  VHOST_NET_F_VIRTIO_NET_HDR) |
(1ULL  VIRTIO_NET_F_MRG_RXBUF),
  };
 -- 
 1.7.2.5
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v4 2/3] vhost: make vhost work queue visible

2012-07-21 Thread Michael S. Tsirkin
On Sat, Jul 21, 2012 at 06:55:37AM +, Nicholas A. Bellinger wrote:
 From: Stefan Hajnoczi stefa...@gmail.com
 
 The vhost work queue allows processing to be done in vhost worker thread
 context, which uses the owner process mm.  Access to the vring and guest
 memory is typically only possible from vhost worker context so it is
 useful to allow work to be queued directly by users.
 
 Currently vhost_net only uses the poll wrappers which do not expose the
 work queue functions.  However, for tcm_vhost (vhost_scsi) it will be
 necessary to queue custom work.
 
 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 Cc: Zhi Yong Wu wu...@cn.ibm.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org

Applied to vhost-next.

 ---
  drivers/vhost/vhost.c |5 ++---
  drivers/vhost/vhost.h |3 +++
  2 files changed, 5 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 94dbd25..1aab08b 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -64,7 +64,7 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned 
 mode, int sync,
   return 0;
  }
  
 -static void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 +void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
  {
   INIT_LIST_HEAD(work-node);
   work-fn = fn;
 @@ -137,8 +137,7 @@ void vhost_poll_flush(struct vhost_poll *poll)
   vhost_work_flush(poll-dev, poll-work);
  }
  
 -static inline void vhost_work_queue(struct vhost_dev *dev,
 - struct vhost_work *work)
 +void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
  {
   unsigned long flags;
  
 diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
 index 07b9763..1125af3 100644
 --- a/drivers/vhost/vhost.h
 +++ b/drivers/vhost/vhost.h
 @@ -43,6 +43,9 @@ struct vhost_poll {
   struct vhost_dev *dev;
  };
  
 +void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 +void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
 +
  void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
unsigned long mask, struct vhost_dev *dev);
  void vhost_poll_start(struct vhost_poll *poll, struct file *file);
 -- 
 1.7.2.5
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v4 3/3] tcm_vhost: Initial merge for vhost level target fabric driver

2012-07-21 Thread Michael S. Tsirkin
On Sat, Jul 21, 2012 at 06:55:38AM +, Nicholas A. Bellinger wrote:
 From: Nicholas Bellinger n...@linux-iscsi.org
 
 This patch adds the initial code for tcm_vhost, a Vhost level TCM
 fabric driver for virtio SCSI initiators into KVM guest.
 
 This code is currently up and running on v3.5-rc2 host+guest along
 with the virtio-scsi vdev-scan() patch to allow a proper
 scsi_scan_host() to occur once the tcm_vhost nexus has been established
 by the paravirtualized virtio-scsi client here:
 
 virtio-scsi: Add vdrv-scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
 http://marc.info/?l=linux-scsim=134160609212542w=2
 
 Using tcm_vhost requires Zhi's - Stefan's qemu vhost-scsi tree here:
 
 https://github.com/wuzhy/qemu/tree/vhost-scsi
 
 along with the recent QEMU patch to hw/virtio-scsi.c to set max_target=0
 during vhost-scsi operation.
 
 Changelog v3 - v4:
 
   Rename vhost_vring_target - vhost_scsi_target (mst + nab)
   Use TRANSPORT_IQN_LEN in vhost_scsi_target-vhost_wwpn[] def (nab)
   Move back to drivers/vhost/, and just use drivers/vhost/Kconfig.tcm (mst)
   Move TCM_VHOST related ioctl defines from include/linux/vhost.h -
   drivers/vhost/tcm_vhost.h as requested by MST (nab)
   Move Kbuild.tcm include from drivers/staging - drivers/vhost/, and
   just use 'if STAGING' around 'source drivers/vhost/Kbuild.tcm'
 
 Changelog v2 - v3:
 
   Unlock on error in tcm_vhost_drop_nexus() (DanC)
   Fix strlen() doesn't count the terminator (DanC)
   Call kfree() on an error path (DanC)
   Convert tcm_vhost_write_pending to use target_execute_cmd (hch + nab)
   Fix another strlen() off by one in tcm_vhost_make_tport (DanC)
   Add option under drivers/staging/Kconfig, and move to drivers/vhost/tcm/
   as requested by MST (nab)
 
 Changelog v1 - v2:
 
   Fix tv_cmd completion - release SGL memory leak (nab)
   Fix sparse warnings for static variable usage ((Fengguang Wu)
   Fix sparse warnings for min() typing + printk format specs (Fengguang Wu)
   Convert to cmwq submission for I/O dispatch (nab + hch)
 
 Changelog v0 - v1:
 
   Merge into single source + header file, and move to drivers/vhost/
 
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 Cc: Zhi Yong Wu wu...@cn.ibm.com
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Christoph Hellwig h...@lst.de
 Cc: Hannes Reinecke h...@suse.de
 Cc: Jens Axboe ax...@kernel.dk
 Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org

There are some comments that I sent that have not been addressed yet,
in particular it's easy for userspace to flood the host log.
But I think it's in an OK state for a staging driver.

I don't know if the dependency on CONFIG_STAGING is enough:
it's easy for someone to miss, if it was in the menu with other
stagung drivers it woul be more obvious.  If the way to do this is to
move it to drivers/staging/tcp_vhost it might be worth it,  I'm fine
either way.  Anyway, I think we both agree it should go in through
Greg's tree so he's the boss.

I put the dependency patches 1 and 2 on my tree
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next
and sending pull request to dave ASAP.

 ---
  drivers/vhost/Kconfig |3 +
  drivers/vhost/Kconfig.tcm |6 +
  drivers/vhost/Makefile|2 +
  drivers/vhost/tcm_vhost.c | 1611 
 +
  drivers/vhost/tcm_vhost.h |   90 +++
  5 files changed, 1712 insertions(+), 0 deletions(-)
  create mode 100644 drivers/vhost/Kconfig.tcm
  create mode 100644 drivers/vhost/tcm_vhost.c
  create mode 100644 drivers/vhost/tcm_vhost.h
 
 diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
 index e4e2fd1..202bba6 100644
 --- a/drivers/vhost/Kconfig
 +++ b/drivers/vhost/Kconfig
 @@ -9,3 +9,6 @@ config VHOST_NET
 To compile this driver as a module, choose M here: the module will
 be called vhost_net.
  
 +if STAGING
 +source drivers/vhost/Kconfig.tcm
 +endif
 diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
 new file mode 100644
 index 000..a9c6f76
 --- /dev/null
 +++ b/drivers/vhost/Kconfig.tcm
 @@ -0,0 +1,6 @@
 +config TCM_VHOST
 + tristate TCM_VHOST fabric module (EXPERIMENTAL)
 + depends on TARGET_CORE  EVENTFD  EXPERIMENTAL  m
 + default n
 + ---help---
 + Say M here to enable the TCM_VHOST fabric module for use with 
 virtio-scsi guests
 diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
 index 72dd020..a27b053 100644
 --- a/drivers/vhost/Makefile
 +++ b/drivers/vhost/Makefile
 @@ -1,2 +1,4 @@
  obj-$(CONFIG_VHOST_NET) += vhost_net.o
  vhost_net-y := vhost.o net.o
 +
 +obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
 diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
 new file mode 100644
 index 000..dc7e024
 --- /dev/null
 +++ b/drivers/vhost/tcm_vhost.c
 @@ -0,0 +1,1611 @@
 +/***
 + * Vhost kernel TCM fabric driver for virtio SCSI initiators
 + *

RE: [PATCH 2/2 v5] KVM: PPC: booke: Add watchdog emulation

2012-07-21 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, July 21, 2012 2:59 AM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Bhushan 
 Bharat-
 R65777
 Subject: Re: [PATCH 2/2 v5] KVM: PPC: booke: Add watchdog emulation
 
 On 07/20/2012 12:00 AM, Bharat Bhushan wrote:
  This patch adds the watchdog emulation in KVM. The watchdog emulation
  is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_WDT) ioctl.
  The kernel timer are used for watchdog emulation and emulates h/w
  watchdog state machine. On watchdog timer expiry, it exit to QEMU if
  TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how it
  is configured.
 
  Signed-off-by: Liu Yu yu@freescale.com
  Signed-off-by: Scott Wood scottw...@freescale.com
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  [bharat.bhus...@freescale.com: reworked patch]
 
 Typically the [] note goes immediately before your signoff (but after the
 others).

ok

 
  +static void arm_next_watchdog(struct kvm_vcpu *vcpu) {
  +   unsigned long nr_jiffies;
  +
  +   spin_lock(vcpu-arch.wdt_lock);
  +   nr_jiffies = watchdog_next_timeout(vcpu);
  +   /*
  +* If the number of jiffies of watchdog timer = NEXT_TIMER_MAX_DELTA
  +* then do not run the watchdog timer as this can break timer APIs.
  +*/
  +   if (nr_jiffies  NEXT_TIMER_MAX_DELTA)
  +   mod_timer(vcpu-arch.wdt_timer, jiffies + nr_jiffies);
  +   else
  +   del_timer(vcpu-arch.wdt_timer);
  +   spin_unlock(vcpu-arch.wdt_lock);
  +}
 
 This needs to be an irqsave lock.

Ok, I want to understood why irqsave lock should be used here?
irqsave_lock is used when the critical section within lock can be referenced 
from interrupt context also. What part of critical section above can get 
affected from local irq? Is it jiffies here?


 
  @@ -386,13 +387,23 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
  #ifdef CONFIG_KVM_EXIT_TIMING
  mutex_init(vcpu-arch.exit_timing_lock);
   #endif
  -
  +#ifdef CONFIG_BOOKE
  +   spin_lock_init(vcpu-arch.wdt_lock);
  +   /* setup watchdog timer once */
  +   setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func,
  +   (unsigned long)vcpu);
  +#endif
  return 0;
   }
 
 Can you do this in kvmppc_booke_init()?

Ok, so the *_uninit part of code also needed to be moved in respective function.

 
 
   void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)  {
  kvmppc_mmu_destroy(vcpu);
  +#ifdef CONFIG_BOOKE
  +   spin_lock(vcpu-arch.wdt_lock);
  +   del_timer(vcpu-arch.wdt_timer);
  +   spin_unlock(vcpu-arch.wdt_lock);
  +#endif
   }
 
 Don't acquire the lock here, but use del_timer_sync().

Ok.

Thanks
-Bharat

 
 -Scott