date:20101014

Re: [PATCH 2/2] KVM test: Remove image_boot=yes from virtio_blk variant

2010-10-14 Thread pradeep

On Thu, 14 Oct 2010 01:24:12 -0300
Lucas Meneghel Rodrigues l...@redhat.com wrote:

 Recent qemu can handle virtio without boot without boot=on,
 and qemu.git will simply state the option as invalid. So
 remove it from the default config on tests_base.cfg, just
 leave it there commented in case someone is testing older
 versions.


But older qemu shipped with distros might require boot=on.
Its good to check qemu version.


Thanks
Pradeep
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [AUTOTEST MIRROR][PATCH] Introduce .gitignore file

2010-10-14 Thread Amos Kong

On Wed, Oct 13, 2010 at 12:47 AM, Lucas Meneghel Rodrigues
l...@redhat.com wrote:
 On Tue, 2010-10-12 at 18:24 +0200, Avi Kivity wrote:
 On 10/12/2010 06:18 PM, Lucas Meneghel Rodrigues wrote:
  On Fri, 2010-10-08 at 15:02 -0300, Eduardo Habkost wrote:
    On Thu, Oct 07, 2010 at 05:22:17PM -0300, Luiz Capitulino wrote:
    
      Signed-off-by: Luiz Capitulinolcapitul...@redhat.com
      ---
      Eduardo, can you actually commit to a git mirror? It's hard to work 
   in your
      repo w/o this file.
  
    Can't we get this added to svn:ignore on the Subversion repository
    first? I would like the git mirror match exactly what's on SVN (with
    svn:ignore being translated to a .gitignore file).
 
  Ok, will ask the folks from google to do that for us.
 
  John, I don't know who is doing the administration of the autotest SVN
  repo, so I am sending this to you. Could you please add the following to
  the svn:ignore list of the repo?
 
  +*.pyc
  +client/control
  +client/results/
  +client/tests/kvm/images
  +client/tests/kvm/env
  +client/tmp
  +client/tests/kvm/*.cfg

 Or just switch to git?

Using git would be convenience and better.

 This is something Martin asked to wait until he had time to do so. Let's
 see what I can do to help on getting it done in a reasonable time frame.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Frame buffer corruptions with KVM = 2.6.36

2010-10-14 Thread Jan Kiszka

Hi,

I'm seeing quite frequent corruptions of the VESA frame buffer with
Linux guests (vga=0x317) that are starting with KVM kernel modules of
upcoming 2.6.36 (I'm currently running -rc7). Effects disappears when
downgrading to kvm-kmod-2.6.35.6. Will see if I can bisect later, but
maybe someone already has an idea or wants to reproduce (just run
something like find / on one text console and witch to another one -
text fragments will remain on the screen on every few switches).

Jan



signature.asc
Description: OpenPGP digital signature

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-14 Thread Krishna Kumar2

Michael S. Tsirkin m...@redhat.com wrote on 10/12/2010 10:39:07 PM:

  Sorry for the delay, I was sick last couple of days. The results
  with your patch are (%'s over original code):
 
  Code   BW%   CPU%   RemoteCPU
  MQ (#txq=16)   31.4% 38.42% 6.41%
  MQ+MST (#txq=16)   28.3% 18.9%  -10.77%
 
  The patch helps CPU utilization but didn't help single stream
  drop.
 
  Thanks,

 What other shared TX/RX locks are there?  In your setup, is the same
 macvtap socket structure used for RX and TX?  If yes this will create
 cacheline bounces as sk_wmem_alloc/sk_rmem_alloc share a cache line,
 there might also be contention on the lock in sk_sleep waitqueue.
 Anything else?

The patch is not introducing any locking (both vhost and virtio-net).
The single stream drop is due to different vhost threads handling the
RX/TX traffic.

I added a heuristic (fuzzy) to determine if more than one flow
is being used on the device, and if not, use vhost[0] for both
tx and rx (vhost_poll_queue figures this out before waking up
the suitable vhost thread).  Testing shows that single stream
performance is as good as the original code.

__
   #txqs = 2 (#vhosts = 3)
# BW1 BW2   (%)   CPU1CPU2 (%)   RCPU1   RCPU2 (%)
__
1 77344   74973 (-3.06)   172 143 (-16.86)   358 324 (-9.49)
2 20924   21107 (.87) 107 103 (-3.73)220 217 (-1.36)
4 21629   32911 (52.16)   214 391 (82.71)446 616 (38.11)
8 21678   34359 (58.49)   428 845 (97.42)892 1286 (44.17)
1622046   34401 (56.04)   841 1677 (99.40)   17852585 (44.81)
2422396   35117 (56.80)   12722447 (92.37)   26673863 (44.84)
3222750   35158 (54.54)   17193233 (88.07)   35695143 (44.10)
4023041   35345 (53.40)   22193970 (78.90)   44786410 (43.14)
4823209   35219 (51.74)   27074685 (73.06)   53867684 (42.66)
6423215   35209 (51.66)   36396195 (70.23)   720610218 (41.79)
8023443   35179 (50.06)   46337625 (64.58)   905112745 (40.81)
9624006   36108 (50.41)   56359096 (61.41)   10864   15283 (40.67)
128   23601   35744 (51.45)   747512104 (61.92)  14495   20405 (40.77)
__
SUM: BW: (37.6) CPU: (69.0) RCPU: (41.2)

__
   #txqs = 8 (#vhosts = 5)
# BW1 BW2(%)  CPU1 CPU2 (%)  RCPU1 RCPU2 (%)
__
1 77344   75341 (-2.58)   172 171 (-.58) 358 356 (-.55)
2 20924   26872 (28.42)   107 135 (26.16)220 262 (19.09)
4 21629   33594 (55.31)   214 394 (84.11)446 615 (37.89)
8 21678   39714 (83.19)   428 949 (121.72)   892 1358 (52.24)
1622046   39879 (80.88)   841 1791 (112.96)  17852737 (53.33)
2422396   38436 (71.61)   12722111 (65.95)   26673453 (29.47)
3222750   38776 (70.44)   17193594 (109.07)  35695421 (51.89)
4023041   38023 (65.02)   22194358 (96.39)   44786507 (45.31)
4823209   33811 (45.68)   27074047 (49.50)   53866222 (15.52)
6423215   30212 (30.13)   36393858 (6.01)72065819 (-19.24)
8023443   34497 (47.15)   46337214 (55.70)   905110776 (19.05)
9624006   30990 (29.09)   56355731 (1.70)10864   8799 (-19.00)
128   23601   29413 (24.62)   74757804 (4.40)14495   11638 (-19.71)
__
SUM: BW: (40.1) CPU: (35.7) RCPU: (4.1)
___


The SD numbers are also good (same table as before, but SD
instead of CPU:

__
   #txqs = 2 (#vhosts = 3)
# BW%   SD1 SD2 (%)RSD1 RSD2 (%)
__
1 -3.06)5   4 (-20.00) 21   19 (-9.52)
2 .87   6   6 (0)  27   27 (0)
4 52.16 26  32 (23.07) 108  103 (-4.62)
8 58.49 103 146 (41.74)431  445 (3.24)
1656.04 407 514 (26.28)1729 1586 (-8.27)
2456.80 934 1161 (24.30)   3916 3665 (-6.40)
3254.54 16682160 (29.49)   6925 6872 (-.76)
4053.40 26553317 (24.93)   1071210707 (-.04)
4851.74 39204486 (14.43)   1559814715 (-5.66)
6451.66 70968250 (16.26)   2809927211 (-3.16)
8050.06 11240   12586 (11.97)  4391342070 (-4.19)
9650.41 16342   16976

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-14 Thread Michael S. Tsirkin

On Thu, Oct 14, 2010 at 01:28:58PM +0530, Krishna Kumar2 wrote:
 Michael S. Tsirkin m...@redhat.com wrote on 10/12/2010 10:39:07 PM:
 
   Sorry for the delay, I was sick last couple of days. The results
   with your patch are (%'s over original code):
  
   Code   BW%   CPU%   RemoteCPU
   MQ (#txq=16)   31.4% 38.42% 6.41%
   MQ+MST (#txq=16)   28.3% 18.9%  -10.77%
  
   The patch helps CPU utilization but didn't help single stream
   drop.
  
   Thanks,
 
  What other shared TX/RX locks are there?  In your setup, is the same
  macvtap socket structure used for RX and TX?  If yes this will create
  cacheline bounces as sk_wmem_alloc/sk_rmem_alloc share a cache line,
  there might also be contention on the lock in sk_sleep waitqueue.
  Anything else?
 
 The patch is not introducing any locking (both vhost and virtio-net).
 The single stream drop is due to different vhost threads handling the
 RX/TX traffic.
 
 I added a heuristic (fuzzy) to determine if more than one flow
 is being used on the device, and if not, use vhost[0] for both
 tx and rx (vhost_poll_queue figures this out before waking up
 the suitable vhost thread).  Testing shows that single stream
 performance is as good as the original code.

...

 This approach works nicely for both single and multiple stream.
 Does this look good?
 
 Thanks,
 
 - KK

Yes, but I guess it depends on the heuristic :) What's the logic?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/11] Descriptions for patches of qemu mce.

2010-10-14 Thread Jin Dongming

These patches do the following changes.
1. Clean up:
 - Making the similar parts as one shared function.
 - modularizing the functions of SRAO and SRAR data setting.
2. Unify sigbus handling:
 -  kvm_handle_sigbus can handle both cases of SIGBUS listed as 
following.
 A) Received by Main thread
 B) Received by VCPU threads
3. Change broadcast:
 - Broadcasting SRAR same as SRAO.
 - Broadcasting SRAO received by VCPU threads same as it by Main Thread.
 - Broadcasting mce depending on the cpu version
   according to the x86 ASDM vol.3A 15.10.4.1.

=
  [PATCH 01/11]kvm, x86: ignore SRAO only when MCG_SER_P is available
  [PATCH 02/11]kvm, x86: introduce kvm_do_set_mce
  [PATCH 03/11]kvm, x86: introduce kvm_mce_in_progress
  [PATCH 04/11]kvm, x86: kvm_mce_inj_* subroutins for templated error injections
  [PATCH 05/11]kvm, x86: introduce kvm_inject_x86_mce_on
  [PATCH 06/11]kvm, x86: use target_phys_addr_t
  [PATCH 07/11]kvm, x86: unify sigbus handling, prep
  [PATCH 08/11]kvm, x86: unify sigbus handling
  [PATCH 09/11]kvm, x86: unify sigbus handling, post1
  [PATCH 10/11]kvm, x86: unify sigbus handling, post2
  [PATCH 11/11]kvm, x86: broadcast mce depending on the cpu version

 qemu-kvm.c |  300 
 1 files changed, 162 insertions(+), 138 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/11] kvm, x86: ignore SRAO only when MCG_SER_P is available

2010-10-14 Thread Jin Dongming

And restruct this block to call kvm_mce_in_exception() only when it is
required.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   15 +--
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index e78d850..6f62973 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1903,12 +1903,15 @@ static void kvm_do_inject_x86_mce(void *_data)
 struct kvm_x86_mce_data *data = _data;
 int r;
 
-/* If there is an MCE excpetion being processed, ignore this SRAO MCE */
-r = kvm_mce_in_exception(data-env);
-if (r == -1) {
-fprintf(stderr, Failed to get MCE status\n);
-} else if (r  !(data-mce-status  MCI_STATUS_AR)) {
-return;
+/* If there is an MCE exception being processed, ignore this SRAO MCE */
+if ((data-env-mcg_cap  MCG_SER_P) 
+!(data-mce-status  MCI_STATUS_AR)) {
+r = kvm_mce_in_exception(data-env);
+if (r == -1) {
+fprintf(stderr, Failed to get MCE status\n);
+} else if (r) {
+return;
+}
 }
 r = kvm_set_mce(data-env, data-mce);
 if (r  0) {
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/11] kvm, x86: introduce kvm_do_set_mce

2010-10-14 Thread Jin Dongming

Share the same error handling.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   31 +++
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 6f62973..1338e99 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1130,6 +1130,22 @@ static void sigbus_reraise(void)
 abort();
 }
 
+#if defined(KVM_CAP_MCE)  defined(TARGET_I386)
+static void kvm_do_set_mce(CPUState *env, struct kvm_x86_mce *mce,
+   int abort_on_error)
+{
+int r;
+
+r = kvm_set_mce(env, mce);
+if (r  0) {
+perror(kvm_set_mce FAILED);
+if (abort_on_error) {
+abort();
+}
+}
+}
+#endif
+
 static void sigbus_handler(int n, struct qemu_signalfd_siginfo *siginfo,
void *ctx)
 {
@@ -1365,11 +1381,7 @@ static void kvm_on_sigbus(CPUState *env, siginfo_t 
*siginfo)
 }
 }
 mce.addr = paddr;
-r = kvm_set_mce(env, mce);
-if (r  0) {
-fprintf(stderr, kvm_set_mce: %s\n, strerror(errno));
-abort();
-}
+kvm_do_set_mce(env, mce, 1);
 } else
 #endif
 {
@@ -1913,13 +1925,8 @@ static void kvm_do_inject_x86_mce(void *_data)
 return;
 }
 }
-r = kvm_set_mce(data-env, data-mce);
-if (r  0) {
-perror(kvm_set_mce FAILED);
-if (data-abort_on_error) {
-abort();
-}
-}
+
+kvm_do_set_mce(data-env, data-mce, data-abort_on_error);
 }
 #endif
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/11] kvm, x86: introduce kvm_mce_in_progress

2010-10-14 Thread Jin Dongming

Share same error handing, and put it in #ifdef MCE  i386.
Rename this function after MCIP (Machine Check In Progress) flag.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   47 ---
 1 files changed, 20 insertions(+), 27 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 1338e99..a71c07c 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1131,6 +1131,21 @@ static void sigbus_reraise(void)
 }
 
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
+static int kvm_mce_in_progress(CPUState *env)
+{
+struct kvm_msr_entry msr_mcg_status = {
+.index = MSR_MCG_STATUS,
+};
+int r;
+
+r = kvm_get_msrs(env, msr_mcg_status, 1);
+if (r == -1 || r == 0) {
+perror(Failed to get MCE status);
+return 0;
+}
+return !!(msr_mcg_status.data  MCG_STATUS_MCIP);
+}
+
 static void kvm_do_set_mce(CPUState *env, struct kvm_x86_mce *mce,
int abort_on_error)
 {
@@ -1315,20 +1330,6 @@ static void flush_queued_work(CPUState *env)
 pthread_cond_broadcast(qemu_work_cond);
 }
 
-static int kvm_mce_in_exception(CPUState *env)
-{
-struct kvm_msr_entry msr_mcg_status = {
-.index = MSR_MCG_STATUS,
-};
-int r;
-
-r = kvm_get_msrs(env, msr_mcg_status, 1);
-if (r == -1 || r == 0) {
-return -1;
-}
-return !!(msr_mcg_status.data  MCG_STATUS_MCIP);
-}
-
 static void kvm_on_sigbus(CPUState *env, siginfo_t *siginfo)
 {
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
@@ -1338,7 +1339,6 @@ static void kvm_on_sigbus(CPUState *env, siginfo_t 
*siginfo)
 void *vaddr;
 ram_addr_t ram_addr;
 unsigned long paddr;
-int r;
 
 if ((env-mcg_cap  MCG_SER_P)  siginfo-si_addr
  (siginfo-si_code == BUS_MCEERR_AR
@@ -1355,12 +1355,10 @@ static void kvm_on_sigbus(CPUState *env, siginfo_t 
*siginfo)
  * If there is an MCE excpetion being processed, ignore
  * this SRAO MCE
  */
-r = kvm_mce_in_exception(env);
-if (r == -1) {
-fprintf(stderr, Failed to get MCE status\n);
-} else if (r) {
+if (kvm_mce_in_progress(env)) {
 return;
 }
+
 /* Fake an Intel architectural Memory scrubbing UCR */
 mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
 | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
@@ -1913,17 +1911,12 @@ struct kvm_x86_mce_data {
 static void kvm_do_inject_x86_mce(void *_data)
 {
 struct kvm_x86_mce_data *data = _data;
-int r;
 
 /* If there is an MCE exception being processed, ignore this SRAO MCE */
 if ((data-env-mcg_cap  MCG_SER_P) 
-!(data-mce-status  MCI_STATUS_AR)) {
-r = kvm_mce_in_exception(data-env);
-if (r == -1) {
-fprintf(stderr, Failed to get MCE status\n);
-} else if (r) {
-return;
-}
+!(data-mce-status  MCI_STATUS_AR) 
+kvm_mce_in_progress(data-env)) {
+return;
 }
 
 kvm_do_set_mce(data-env, data-mce, data-abort_on_error);
-- 
1.7.1.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/11] kvm, x86: kvm_mce_inj_* subroutins for templated error injections

2010-10-14 Thread Jin Dongming

Refactor codes for maintainability.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   96 ---
 1 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index a71c07c..9f248f0 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1159,6 +1159,51 @@ static void kvm_do_set_mce(CPUState *env, struct 
kvm_x86_mce *mce,
 }
 }
 }
+
+static void kvm_mce_inj_srar_dataload(CPUState *env, unsigned long paddr)
+{
+struct kvm_x86_mce mce = {
+.bank = 9,
+.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+  | MCI_STATUS_AR | 0x134,
+.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV,
+.addr = paddr,
+.misc = (MCM_ADDR_PHYS  6) | 0xc,
+};
+
+kvm_do_set_mce(env, mce, 1);
+}
+
+static void kvm_mce_inj_srao_memscrub(CPUState *env, unsigned long paddr)
+{
+struct kvm_x86_mce mce = {
+.bank = 9,
+.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+  | 0xc0,
+.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+.addr = paddr,
+.misc = (MCM_ADDR_PHYS  6) | 0xc,
+};
+
+kvm_do_set_mce(env, mce, 1);
+}
+
+static void kvm_mce_inj_srao_broadcast(unsigned long paddr)
+{
+CPUState *cenv;
+
+kvm_inject_x86_mce(first_cpu, 9,
+   MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+   | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+   | 0xc0,
+   MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
+   (MCM_ADDR_PHYS  6) | 0xc, 1);
+for (cenv = first_cpu-next_cpu; cenv != NULL; cenv = cenv-next_cpu)
+kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
+   MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
+}
 #endif
 
 static void sigbus_handler(int n, struct qemu_signalfd_siginfo *siginfo,
@@ -1167,11 +1212,9 @@ static void sigbus_handler(int n, struct 
qemu_signalfd_siginfo *siginfo,
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
 if ((first_cpu-mcg_cap  MCG_SER_P)  siginfo-ssi_addr
  siginfo-ssi_code == BUS_MCEERR_AO) {
-uint64_t status;
 void *vaddr;
 ram_addr_t ram_addr;
 unsigned long paddr;
-CPUState *cenv;
 
 /* Hope we are lucky for AO MCE */
 vaddr = (void *)(intptr_t)siginfo-ssi_addr;
@@ -1182,16 +1225,7 @@ static void sigbus_handler(int n, struct 
qemu_signalfd_siginfo *siginfo,
 (unsigned long long)siginfo-ssi_addr);
 return;
 }
-status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-| MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-| 0xc0;
-kvm_inject_x86_mce(first_cpu, 9, status,
-   MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
-   (MCM_ADDR_PHYS  6) | 0xc, 1);
-for (cenv = first_cpu-next_cpu; cenv != NULL; cenv = cenv-next_cpu) {
-kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
-   MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
-}
+kvm_mce_inj_srao_broadcast(paddr);
 } else
 #endif
 {
@@ -1333,9 +1367,6 @@ static void flush_queued_work(CPUState *env)
 static void kvm_on_sigbus(CPUState *env, siginfo_t *siginfo)
 {
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
-struct kvm_x86_mce mce = {
-.bank = 9,
-};
 void *vaddr;
 ram_addr_t ram_addr;
 unsigned long paddr;
@@ -1343,28 +1374,12 @@ static void kvm_on_sigbus(CPUState *env, siginfo_t 
*siginfo)
 if ((env-mcg_cap  MCG_SER_P)  siginfo-si_addr
  (siginfo-si_code == BUS_MCEERR_AR
 || siginfo-si_code == BUS_MCEERR_AO)) {
-if (siginfo-si_code == BUS_MCEERR_AR) {
-/* Fake an Intel architectural Data Load SRAR UCR */
-mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-| MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-| MCI_STATUS_AR | 0x134;
-mce.misc = (MCM_ADDR_PHYS  6) | 0xc;
-mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV;
-} else {
-/*
- * If there is an MCE excpetion being processed, ignore
- * this SRAO MCE
- */
-if (kvm_mce_in_progress(env)) {
-return;
-}
 
-/* Fake an Intel architectural Memory scrubbing UCR */
-mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-| MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-| 0xc0;
-mce.misc = (MCM_ADDR_PHYS  6) | 0xc;
-

[PATCH 05/11] kvm, x86: introduce kvm_inject_x86_mce_on

2010-10-14 Thread Jin Dongming

Pass a table instead of multiple args.

Note:

kvm_inject_x86_mce(env, bank, status, mcg_status, addr, misc,
   abort_on_error);

is equal to:

struct kvm_x86_mce mce = {
.bank = bank,
.status = status,
.mcg_status = mcg_status,
.addr = addr,
.misc = misc,
};
kvm_inject_x86_mce_on(env, mce, abort_on_error);

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   56 ++--
 1 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 9f248f0..0ba42fc 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1131,6 +1131,9 @@ static void sigbus_reraise(void)
 }
 
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
+static void kvm_inject_x86_mce_on(CPUState *env, struct kvm_x86_mce *mce,
+  int abort_on_error);
+
 static int kvm_mce_in_progress(CPUState *env)
 {
 struct kvm_msr_entry msr_mcg_status = {
@@ -1192,17 +1195,27 @@ static void kvm_mce_inj_srao_memscrub(CPUState *env, 
unsigned long paddr)
 
 static void kvm_mce_inj_srao_broadcast(unsigned long paddr)
 {
+struct kvm_x86_mce mce_srao_memscrub = {
+.bank = 9,
+.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+  | 0xc0,
+.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+.addr = paddr,
+.misc = (MCM_ADDR_PHYS  6) | 0xc,
+};
+struct kvm_x86_mce mce_dummy = {
+.bank = 1,
+.status = MCI_STATUS_VAL | MCI_STATUS_UC,
+.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+.addr = 0,
+.misc = 0,
+};
 CPUState *cenv;
 
-kvm_inject_x86_mce(first_cpu, 9,
-   MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-   | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-   | 0xc0,
-   MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
-   (MCM_ADDR_PHYS  6) | 0xc, 1);
+kvm_inject_x86_mce_on(first_cpu, mce_srao_memscrub, 1);
 for (cenv = first_cpu-next_cpu; cenv != NULL; cenv = cenv-next_cpu)
-kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
-   MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
+kvm_inject_x86_mce_on(cenv, mce_dummy, 1);
 }
 #endif
 
@@ -1941,6 +1954,22 @@ static void kvm_do_inject_x86_mce(void *_data)
 
 kvm_do_set_mce(data-env, data-mce, data-abort_on_error);
 }
+
+static void kvm_inject_x86_mce_on(CPUState *env, struct kvm_x86_mce *mce,
+  int abort_on_error)
+{
+struct kvm_x86_mce_data data = {
+.env = env,
+.mce = mce,
+.abort_on_error = abort_on_error,
+};
+
+if (!env-mcg_cap) {
+perror(MCE support is not enabled!);
+return;
+}
+on_vcpu(env, kvm_do_inject_x86_mce, data);
+}
 #endif
 
 void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
@@ -1955,17 +1984,8 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, 
uint64_t status,
 .addr = addr,
 .misc = misc,
 };
-struct kvm_x86_mce_data data = {
-.env = cenv,
-.mce = mce,
-.abort_on_error = abort_on_error,
-};
 
-if (!cenv-mcg_cap) {
-fprintf(stderr, MCE support is not enabled!\n);
-return;
-}
-on_vcpu(cenv, kvm_do_inject_x86_mce, data);
+kvm_inject_x86_mce_on(cenv, mce, abort_on_error);
 #else
 if (abort_on_error) {
 abort();
-- 
1.7.1.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/11] kvm, x86: use target_phys_addr_t

2010-10-14 Thread Jin Dongming

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 0ba42fc..89ae524 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1163,7 +1163,7 @@ static void kvm_do_set_mce(CPUState *env, struct 
kvm_x86_mce *mce,
 }
 }
 
-static void kvm_mce_inj_srar_dataload(CPUState *env, unsigned long paddr)
+static void kvm_mce_inj_srar_dataload(CPUState *env, target_phys_addr_t paddr)
 {
 struct kvm_x86_mce mce = {
 .bank = 9,
@@ -1178,7 +1178,7 @@ static void kvm_mce_inj_srar_dataload(CPUState *env, 
unsigned long paddr)
 kvm_do_set_mce(env, mce, 1);
 }
 
-static void kvm_mce_inj_srao_memscrub(CPUState *env, unsigned long paddr)
+static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
 {
 struct kvm_x86_mce mce = {
 .bank = 9,
@@ -1193,7 +1193,7 @@ static void kvm_mce_inj_srao_memscrub(CPUState *env, 
unsigned long paddr)
 kvm_do_set_mce(env, mce, 1);
 }
 
-static void kvm_mce_inj_srao_broadcast(unsigned long paddr)
+static void kvm_mce_inj_srao_broadcast(target_phys_addr_t paddr)
 {
 struct kvm_x86_mce mce_srao_memscrub = {
 .bank = 9,
@@ -1227,12 +1227,12 @@ static void sigbus_handler(int n, struct 
qemu_signalfd_siginfo *siginfo,
  siginfo-ssi_code == BUS_MCEERR_AO) {
 void *vaddr;
 ram_addr_t ram_addr;
-unsigned long paddr;
+target_phys_addr_t paddr;
 
 /* Hope we are lucky for AO MCE */
 vaddr = (void *)(intptr_t)siginfo-ssi_addr;
 if (do_qemu_ram_addr_from_host(vaddr, ram_addr) ||
-!kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, 
(target_phys_addr_t *)paddr)) {
+!kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, paddr)) {
 fprintf(stderr, Hardware memory error for memory used by 
 QEMU itself instead of guest system!: %llx\n,
 (unsigned long long)siginfo-ssi_addr);
@@ -1382,7 +1382,7 @@ static void kvm_on_sigbus(CPUState *env, siginfo_t 
*siginfo)
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
 void *vaddr;
 ram_addr_t ram_addr;
-unsigned long paddr;
+target_phys_addr_t paddr;
 
 if ((env-mcg_cap  MCG_SER_P)  siginfo-si_addr
  (siginfo-si_code == BUS_MCEERR_AR
@@ -1396,7 +1396,7 @@ static void kvm_on_sigbus(CPUState *env, siginfo_t 
*siginfo)
 }
 vaddr = (void *)siginfo-si_addr;
 if (do_qemu_ram_addr_from_host(vaddr, ram_addr) ||
-!kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, 
(target_phys_addr_t *)paddr)) {
+!kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, paddr)) {
 fprintf(stderr, Hardware memory error for memory used by 
 QEMU itself instead of guest system!\n);
 /* Hope we are lucky for AO MCE */
-- 
1.7.1.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/11] kvm, x86: unify sigbus handling, prep

2010-10-14 Thread Jin Dongming

There are 2 similar functions to handle SIGBUS:
  sigbus_handler(int n, struct qemu_signalfd_siginfo *siginfo,
 void *ctx)
  kvm_on_sigbus(CPUState *env, siginfo_t *siginfo)

The former is used when main thread receives SIGBUS via signalfd,
while latter is used when vcpu thread receives SIGBUS.
These 2 take different siginfo, but in both case required parameters
are common, the code and the addr in the info.

Restruct functions to take the code and the addr explicitly.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   41 -
 1 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 89ae524..b58181a 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1219,32 +1219,28 @@ static void 
kvm_mce_inj_srao_broadcast(target_phys_addr_t paddr)
 }
 #endif
 
-static void sigbus_handler(int n, struct qemu_signalfd_siginfo *siginfo,
-   void *ctx)
+static void kvm_handle_sigbus(int code, void *vaddr)
 {
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
-if ((first_cpu-mcg_cap  MCG_SER_P)  siginfo-ssi_addr
- siginfo-ssi_code == BUS_MCEERR_AO) {
-void *vaddr;
+if ((first_cpu-mcg_cap  MCG_SER_P)  vaddr  code == BUS_MCEERR_AO) {
 ram_addr_t ram_addr;
 target_phys_addr_t paddr;
 
 /* Hope we are lucky for AO MCE */
-vaddr = (void *)(intptr_t)siginfo-ssi_addr;
 if (do_qemu_ram_addr_from_host(vaddr, ram_addr) ||
 !kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, paddr)) {
 fprintf(stderr, Hardware memory error for memory used by 
 QEMU itself instead of guest system!: %llx\n,
-(unsigned long long)siginfo-ssi_addr);
+(unsigned long long)vaddr);
 return;
 }
 kvm_mce_inj_srao_broadcast(paddr);
 } else
 #endif
 {
-if (siginfo-ssi_code == BUS_MCEERR_AO) {
+if (code == BUS_MCEERR_AO) {
 return;
-} else if (siginfo-ssi_code == BUS_MCEERR_AR) {
+} else if (code == BUS_MCEERR_AR) {
 hardware_memory_error();
 } else {
 sigbus_reraise();
@@ -1252,6 +1248,11 @@ static void sigbus_handler(int n, struct 
qemu_signalfd_siginfo *siginfo,
 }
 }
 
+static void sigbus_handler(int n, struct qemu_signalfd_siginfo *ssi, void *ctx)
+{
+kvm_handle_sigbus(ssi-ssi_code, (void *)(intptr_t)ssi-ssi_addr);
+}
+
 static void on_vcpu(CPUState *env, void (*func)(void *data), void *data)
 {
 struct qemu_work_item wi;
@@ -1377,36 +1378,34 @@ static void flush_queued_work(CPUState *env)
 pthread_cond_broadcast(qemu_work_cond);
 }
 
-static void kvm_on_sigbus(CPUState *env, siginfo_t *siginfo)
+static void kvm_on_sigbus(CPUState *env, int code, void *vaddr)
 {
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
-void *vaddr;
 ram_addr_t ram_addr;
 target_phys_addr_t paddr;
 
-if ((env-mcg_cap  MCG_SER_P)  siginfo-si_addr
- (siginfo-si_code == BUS_MCEERR_AR
-|| siginfo-si_code == BUS_MCEERR_AO)) {
+if ((env-mcg_cap  MCG_SER_P)  vaddr
+ (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
 
 /*
  * If there is an MCE excpetion being processed, ignore this SRAO MCE
  */
-if (siginfo-si_code == BUS_MCEERR_AO  kvm_mce_in_progress(env)) {
+if (code == BUS_MCEERR_AO  kvm_mce_in_progress(env)) }
 return;
 }
-vaddr = (void *)siginfo-si_addr;
+
 if (do_qemu_ram_addr_from_host(vaddr, ram_addr) ||
 !kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, paddr)) {
 fprintf(stderr, Hardware memory error for memory used by 
 QEMU itself instead of guest system!\n);
 /* Hope we are lucky for AO MCE */
-if (siginfo-si_code == BUS_MCEERR_AO) {
+if (code == BUS_MCEERR_AO) {
 return;
 } else {
 hardware_memory_error();
 }
 }
-if (siginfo-si_code == BUS_MCEERR_AR) {
+if (code == BUS_MCEERR_AR) {
 /* Fake an Intel architectural Data Load SRAR UCR */
 kvm_mce_inj_srar_dataload(env, paddr);
 } else {
@@ -1416,9 +1415,9 @@ static void kvm_on_sigbus(CPUState *env, siginfo_t 
*siginfo)
 } else
 #endif
 {
-if (siginfo-si_code == BUS_MCEERR_AO) {
+if (code == BUS_MCEERR_AO) {
 return;
-} else if (siginfo-si_code == BUS_MCEERR_AR) {
+} else if (code == BUS_MCEERR_AR) {
 hardware_memory_error();
 } else {
 sigbus_reraise();
@@ -1455,7 +1454,7 @@ static void kvm_main_loop_wait(CPUState *env, int timeout)
 
 switch (r) {
 case SIGBUS:
-kvm_on_sigbus(env, siginfo);
+kvm_on_sigbus(env, siginfo.si_code,

[PATCH 08/11] kvm, x86: unify sigbus handling

2010-10-14 Thread Jin Dongming

Now kvm_handle_sigbus can handle both cases of SIGBUS.

Note that env is NULL when main thread receives SIGBUS via
signalfd, otherwise env points vcpu thread that receives SIGBUS.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   94 +++-
 1 files changed, 42 insertions(+), 52 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index b58181a..16bc006 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1219,10 +1219,12 @@ static void 
kvm_mce_inj_srao_broadcast(target_phys_addr_t paddr)
 }
 #endif
 
-static void kvm_handle_sigbus(int code, void *vaddr)
+static void kvm_handle_sigbus(CPUState *env, int code, void *vaddr)
 {
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
-if ((first_cpu-mcg_cap  MCG_SER_P)  vaddr  code == BUS_MCEERR_AO) {
+/* env == NULL: when main thread received a SIGBUS */
+if (!env  (first_cpu-mcg_cap  MCG_SER_P)  vaddr
+ code == BUS_MCEERR_AO) {
 ram_addr_t ram_addr;
 target_phys_addr_t paddr;
 
@@ -1235,7 +1237,42 @@ static void kvm_handle_sigbus(int code, void *vaddr)
 return;
 }
 kvm_mce_inj_srao_broadcast(paddr);
-} else
+return;
+}
+
+/* env != NULL: when vcpu thread received a SIGBUS */
+if (env  (env-mcg_cap  MCG_SER_P)  vaddr
+ (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
+ram_addr_t ram_addr;
+unsigned long paddr;
+
+/*
+ * If there is an MCE excpetion being processed, ignore this SRAO MCE
+ */
+if (code == BUS_MCEERR_AO  kvm_mce_in_progress(env)) {
+return;
+}
+
+if (do_qemu_ram_addr_from_host(vaddr, ram_addr) ||
+!kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, paddr)) {
+fprintf(stderr, Hardware memory error for memory used by 
+QEMU itself instaed of guest system!\n);
+/* Hope we are lucky for AO MCE */
+if (code == BUS_MCEERR_AO) {
+return;
+} else {
+hardware_memory_error();
+}
+}
+if (code == BUS_MCEERR_AR) {
+/* Fake an Intel architectural Data Load SRAR UCR */
+kvm_mce_inj_srar_dataload(env, paddr);
+} else {
+/* Fake an Intel architectural Memory scrubbing UCR */
+kvm_mce_inj_srao_memscrub(env, paddr);
+}
+return;
+}
 #endif
 {
 if (code == BUS_MCEERR_AO) {
@@ -1250,7 +1287,7 @@ static void kvm_handle_sigbus(int code, void *vaddr)
 
 static void sigbus_handler(int n, struct qemu_signalfd_siginfo *ssi, void *ctx)
 {
-kvm_handle_sigbus(ssi-ssi_code, (void *)(intptr_t)ssi-ssi_addr);
+kvm_handle_sigbus(NULL, ssi-ssi_code, (void *)(intptr_t)ssi-ssi_addr);
 }
 
 static void on_vcpu(CPUState *env, void (*func)(void *data), void *data)
@@ -1378,53 +1415,6 @@ static void flush_queued_work(CPUState *env)
 pthread_cond_broadcast(qemu_work_cond);
 }
 
-static void kvm_on_sigbus(CPUState *env, int code, void *vaddr)
-{
-#if defined(KVM_CAP_MCE)  defined(TARGET_I386)
-ram_addr_t ram_addr;
-target_phys_addr_t paddr;
-
-if ((env-mcg_cap  MCG_SER_P)  vaddr
- (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
-
-/*
- * If there is an MCE excpetion being processed, ignore this SRAO MCE
- */
-if (code == BUS_MCEERR_AO  kvm_mce_in_progress(env)) }
-return;
-}
-
-if (do_qemu_ram_addr_from_host(vaddr, ram_addr) ||
-!kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, paddr)) {
-fprintf(stderr, Hardware memory error for memory used by 
-QEMU itself instead of guest system!\n);
-/* Hope we are lucky for AO MCE */
-if (code == BUS_MCEERR_AO) {
-return;
-} else {
-hardware_memory_error();
-}
-}
-if (code == BUS_MCEERR_AR) {
-/* Fake an Intel architectural Data Load SRAR UCR */
-kvm_mce_inj_srar_dataload(env, paddr);
-} else {
-/* Fake an Intel architectural Memory scrubbing UCR */
-kvm_mce_inj_srao_memscrub(env, paddr);
-}
-} else
-#endif
-{
-if (code == BUS_MCEERR_AO) {
-return;
-} else if (code == BUS_MCEERR_AR) {
-hardware_memory_error();
-} else {
-sigbus_reraise();
-}
-}
-}
-
 static void kvm_main_loop_wait(CPUState *env, int timeout)
 {
 struct timespec ts;
@@ -1454,7 +1444,7 @@ static void kvm_main_loop_wait(CPUState *env, int timeout)
 
 switch (r) {
 case SIGBUS:
-kvm_on_sigbus(env, siginfo.si_code, (void *)siginfo.si_addr);
+kvm_handle_sigbus(env, siginfo.si_code, (void *)siginfo.si_addr);
 break;
 default:

[PATCH 09/11] kvm, x86: unify sigbus handling, post1

2010-10-14 Thread Jin Dongming

Explicitly duplicate blocks for next cleanup.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   56 +---
 1 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 16bc006..d96394b 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1223,12 +1223,20 @@ static void kvm_handle_sigbus(CPUState *env, int code, 
void *vaddr)
 {
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
 /* env == NULL: when main thread received a SIGBUS */
-if (!env  (first_cpu-mcg_cap  MCG_SER_P)  vaddr
- code == BUS_MCEERR_AO) {
+if (!env  vaddr  (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
 ram_addr_t ram_addr;
 target_phys_addr_t paddr;
 
-/* Hope we are lucky for AO MCE */
+/* Give up MCE forwarding if immediate action required on main thread 
*/
+if (code == BUS_MCEERR_AR) {
+goto out;
+}
+
+/* Check if recoverable MCE support is enabled */
+if (!(first_cpu-mcg_cap  MCG_SER_P)){
+goto out;
+}
+
 if (do_qemu_ram_addr_from_host(vaddr, ram_addr) ||
 !kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, paddr)) {
 fprintf(stderr, Hardware memory error for memory used by 
@@ -1236,19 +1244,22 @@ static void kvm_handle_sigbus(CPUState *env, int code, 
void *vaddr)
 (unsigned long long)vaddr);
 return;
 }
+/* Broadcast SRAO UCR to all vcpu threads */
 kvm_mce_inj_srao_broadcast(paddr);
 return;
 }
 
 /* env != NULL: when vcpu thread received a SIGBUS */
-if (env  (env-mcg_cap  MCG_SER_P)  vaddr
- (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
+if (env  vaddr  (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
 ram_addr_t ram_addr;
 unsigned long paddr;
 
-/*
- * If there is an MCE excpetion being processed, ignore this SRAO MCE
- */
+/* Check if recoverable MCE support is enabled */
+if (!(env-mcg_cap  MCG_SER_P)){
+goto out;
+}
+
+/* If there is an MCE exception being processed, ignore this SRAO MCE 
*/
 if (code == BUS_MCEERR_AO  kvm_mce_in_progress(env)) {
 return;
 }
@@ -1256,13 +1267,9 @@ static void kvm_handle_sigbus(CPUState *env, int code, 
void *vaddr)
 if (do_qemu_ram_addr_from_host(vaddr, ram_addr) ||
 !kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, paddr)) {
 fprintf(stderr, Hardware memory error for memory used by 
-QEMU itself instaed of guest system!\n);
-/* Hope we are lucky for AO MCE */
-if (code == BUS_MCEERR_AO) {
-return;
-} else {
-hardware_memory_error();
-}
+QEMU itself instead of guest system!: %llx\n,
+(unsigned long long)vaddr);
+goto out;
 }
 if (code == BUS_MCEERR_AR) {
 /* Fake an Intel architectural Data Load SRAR UCR */
@@ -1273,15 +1280,18 @@ static void kvm_handle_sigbus(CPUState *env, int code, 
void *vaddr)
 }
 return;
 }
+out:
 #endif
-{
-if (code == BUS_MCEERR_AO) {
-return;
-} else if (code == BUS_MCEERR_AR) {
-hardware_memory_error();
-} else {
-sigbus_reraise();
-}
+/* Hope we are lucky for AO MCE */
+if (code == BUS_MCEERR_AO) {
+return;
+}
+
+/* Abort in either way */
+if (code == BUS_MCEERR_AR) {
+hardware_memory_error();
+} else {
+sigbus_reraise();
 }
 }
 
-- 
1.7.1.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/11] kvm, x86: unify sigbus handling, post2

2010-10-14 Thread Jin Dongming

Cleanup to finish unification.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   41 -
 1 files changed, 12 insertions(+), 29 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index d96394b..d2b2459 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1222,45 +1222,24 @@ static void 
kvm_mce_inj_srao_broadcast(target_phys_addr_t paddr)
 static void kvm_handle_sigbus(CPUState *env, int code, void *vaddr)
 {
 #if defined(KVM_CAP_MCE)  defined(TARGET_I386)
-/* env == NULL: when main thread received a SIGBUS */
-if (!env  vaddr  (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
+if (vaddr  (code == BUS_MCEERR_AO || code == BUS_MCEERR_AR)) {
 ram_addr_t ram_addr;
 target_phys_addr_t paddr;
+CPUState *target_env;
 
 /* Give up MCE forwarding if immediate action required on main thread 
*/
-if (code == BUS_MCEERR_AR) {
+if (!env  code == BUS_MCEERR_AR) {
 goto out;
 }
 
 /* Check if recoverable MCE support is enabled */
-if (!(first_cpu-mcg_cap  MCG_SER_P)){
-goto out;
-}
-
-if (do_qemu_ram_addr_from_host(vaddr, ram_addr) ||
-!kvm_physical_memory_addr_from_ram(kvm_state, ram_addr, paddr)) {
-fprintf(stderr, Hardware memory error for memory used by 
-QEMU itself instead of guest system!: %llx\n,
-(unsigned long long)vaddr);
-return;
-}
-/* Broadcast SRAO UCR to all vcpu threads */
-kvm_mce_inj_srao_broadcast(paddr);
-return;
-}
-
-/* env != NULL: when vcpu thread received a SIGBUS */
-if (env  vaddr  (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) {
-ram_addr_t ram_addr;
-unsigned long paddr;
-
-/* Check if recoverable MCE support is enabled */
-if (!(env-mcg_cap  MCG_SER_P)){
+target_env = env ? env : first_cpu;
+if (!target_env || !(target_env-mcg_cap  MCG_SER_P)) {
 goto out;
 }
 
 /* If there is an MCE exception being processed, ignore this SRAO MCE 
*/
-if (code == BUS_MCEERR_AO  kvm_mce_in_progress(env)) {
+if (env  code == BUS_MCEERR_AO  kvm_mce_in_progress(env)) {
 return;
 }
 
@@ -1273,10 +1252,14 @@ static void kvm_handle_sigbus(CPUState *env, int code, 
void *vaddr)
 }
 if (code == BUS_MCEERR_AR) {
 /* Fake an Intel architectural Data Load SRAR UCR */
-kvm_mce_inj_srar_dataload(env, paddr);
+kvm_mce_inj_srar_dataload(target_env, paddr);
 } else {
 /* Fake an Intel architectural Memory scrubbing UCR */
-kvm_mce_inj_srao_memscrub(env, paddr);
+if (env) {
+kvm_mce_inj_srao_memscrub(target_env, paddr);
+} else {
+kvm_mce_inj_srao_broadcast(paddr);
+}
 }
 return;
 }
-- 
1.7.1.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/11] kvm, x86: broadcast mce depending on the cpu version

2010-10-14 Thread Jin Dongming

There is no reason why SRAO event received by the main thread
is the only one that being broadcasted.

According to the x86 ASDM vol.3A 15.10.4.1,
MCE signal is broadcast on processor version 06H_EH or later.

This change is required to handle SRAR in the guest.

Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
Tested-by: Jin Dongming jin.dongm...@np.css.fujitsu.com
---
 qemu-kvm.c |   63 +--
 1 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index d2b2459..846f0b6 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1149,6 +1149,34 @@ static int kvm_mce_in_progress(CPUState *env)
 return !!(msr_mcg_status.data  MCG_STATUS_MCIP);
 }
 
+static void kvm_mce_inj_broadcast(CPUState *env, struct kvm_x86_mce *mce)
+{
+struct kvm_x86_mce mce_sub = {
+.bank = 1,
+.status = MCI_STATUS_VAL | MCI_STATUS_UC,
+.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+.addr = 0,
+.misc = 0,
+};
+CPUState *cenv;
+int family, model, cpuver = env-cpuid_version;
+
+family = (cpuver  8)  0xf;
+model = ((cpuver  12)  0xf0) + ((cpuver  4)  0xf);
+
+kvm_inject_x86_mce_on(env, mce, 1);
+
+/* Broadcast MCA signal for processor version 06H_EH and above */
+if ((family == 6  model = 14) || family  6) {
+for (cenv = first_cpu; cenv != NULL; cenv = cenv-next_cpu) {
+if (cenv == env) {
+continue;
+}
+kvm_inject_x86_mce_on(cenv, mce_sub, 1);
+}
+}
+}
+
 static void kvm_do_set_mce(CPUState *env, struct kvm_x86_mce *mce,
int abort_on_error)
 {
@@ -1175,7 +1203,7 @@ static void kvm_mce_inj_srar_dataload(CPUState *env, 
target_phys_addr_t paddr)
 .misc = (MCM_ADDR_PHYS  6) | 0xc,
 };
 
-kvm_do_set_mce(env, mce, 1);
+kvm_mce_inj_broadcast(env, mce);
 }
 
 static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
@@ -1190,32 +1218,7 @@ static void kvm_mce_inj_srao_memscrub(CPUState *env, 
target_phys_addr_t paddr)
 .misc = (MCM_ADDR_PHYS  6) | 0xc,
 };
 
-kvm_do_set_mce(env, mce, 1);
-}
-
-static void kvm_mce_inj_srao_broadcast(target_phys_addr_t paddr)
-{
-struct kvm_x86_mce mce_srao_memscrub = {
-.bank = 9,
-.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-  | 0xc0,
-.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-.addr = paddr,
-.misc = (MCM_ADDR_PHYS  6) | 0xc,
-};
-struct kvm_x86_mce mce_dummy = {
-.bank = 1,
-.status = MCI_STATUS_VAL | MCI_STATUS_UC,
-.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
-.addr = 0,
-.misc = 0,
-};
-CPUState *cenv;
-
-kvm_inject_x86_mce_on(first_cpu, mce_srao_memscrub, 1);
-for (cenv = first_cpu-next_cpu; cenv != NULL; cenv = cenv-next_cpu)
-kvm_inject_x86_mce_on(cenv, mce_dummy, 1);
+kvm_mce_inj_broadcast(env, mce);
 }
 #endif
 
@@ -1255,11 +1258,7 @@ static void kvm_handle_sigbus(CPUState *env, int code, 
void *vaddr)
 kvm_mce_inj_srar_dataload(target_env, paddr);
 } else {
 /* Fake an Intel architectural Memory scrubbing UCR */
-if (env) {
-kvm_mce_inj_srao_memscrub(target_env, paddr);
-} else {
-kvm_mce_inj_srao_broadcast(paddr);
-}
+kvm_mce_inj_srao_memscrub(target_env, paddr);
 }
 return;
 }
-- 
1.7.1.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-14 Thread Krishna Kumar2

 Michael S. Tsirkin m...@redhat.com
   What other shared TX/RX locks are there?  In your setup, is the same
   macvtap socket structure used for RX and TX?  If yes this will create
   cacheline bounces as sk_wmem_alloc/sk_rmem_alloc share a cache line,
   there might also be contention on the lock in sk_sleep waitqueue.
   Anything else?
 
  The patch is not introducing any locking (both vhost and virtio-net).
  The single stream drop is due to different vhost threads handling the
  RX/TX traffic.
 
  I added a heuristic (fuzzy) to determine if more than one flow
  is being used on the device, and if not, use vhost[0] for both
  tx and rx (vhost_poll_queue figures this out before waking up
  the suitable vhost thread).  Testing shows that single stream
  performance is as good as the original code.

 ...

  This approach works nicely for both single and multiple stream.
  Does this look good?
 
  Thanks,
 
  - KK

 Yes, but I guess it depends on the heuristic :) What's the logic?

I define how recently a txq was used. If 0 or 1 txq's were used
recently, use vq[0] (which also handles rx). Otherwise, use
multiple txq (vq[1-n]). The code is:

/*
 * Algorithm for selecting vq:
 *
 * ConditionReturn
 * RX vqvq[0]
 * If all txqs unused   vq[0]
 * If one txq used, and new txq is same vq[0]
 * If one txq used, and new txq is differentvq[vq-qnum]
 * If  1 txqs used vq[vq-qnum]
 *  Where used means the txq was used in the last 'n' jiffies.
 *
 * Note: locking is not required as an update race will only result in
 * a different worker being woken up.
 */
static inline struct vhost_virtqueue *vhost_find_vq(struct vhost_poll
*poll)
{
if (poll-vq-qnum) {
struct vhost_dev *dev = poll-vq-dev;
struct vhost_virtqueue *vq = dev-vqs[0];
unsigned long max_time = jiffies - 5; /* Some macro needed */
unsigned long *table = dev-jiffies;
int i, used = 0;

for (i = 0; i  dev-nvqs - 1; i++) {
if (time_after_eq(table[i], max_time)  ++used  1) {
vq = poll-vq;
break;
}
}
table[poll-vq-qnum - 1] = jiffies;
return vq;
}

/* RX is handled by the same worker thread */
return poll-vq;
}

void vhost_poll_queue(struct vhost_poll *poll)
{
struct vhost_virtqueue *vq = vhost_find_vq(poll);

vhost_work_queue(vq, poll-work);
}

Since poll batches packets, find_vq does not seem to add much
to the CPU utilization (or BW). I am sure that code can be
optimized much better.

The results I sent in my last mail were without your use_mm
patch, and the only tuning was to make vhost threads run on
only cpus 0-3 (though the performance is good even without
that). I will test it later today with the use_mm patch too.

Thanks,

- KK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 09/12] Inject asynchronous page fault into a PV guest if page is swapped out.

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

Send async page fault to a PV guest if it accesses swapped out memory.
Guest will choose another task to run upon receiving the fault.

Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able
to reschedule.

Vcpu will be halted if guest will fault on the same page again or if
vcpu executes kernel code.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |3 ++
 arch/x86/kvm/mmu.c  |1 +
 arch/x86/kvm/x86.c  |   43 ++
 include/trace/events/kvm.h  |   17 ++-
 virt/kvm/async_pf.c |3 +-
 5 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 26b2064..f1868ed 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -421,6 +421,7 @@ struct kvm_vcpu_arch {
gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
struct gfn_to_hva_cache data;
u64 msr_val;
+   u32 id;
} apf;
 };
 
@@ -596,6 +597,7 @@ struct kvm_x86_ops {
 };
 
 struct kvm_arch_async_pf {
+   u32 token;
gfn_t gfn;
 };
 
@@ -843,6 +845,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
 void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
   struct kvm_async_pf *work);
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
 extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 11d152b..463ff2e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2590,6 +2590,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
 {
struct kvm_arch_async_pf arch;
+   arch.token = (vcpu-arch.apf.id++  12) | vcpu-vcpu_id;
arch.gfn = gfn;
 
return kvm_setup_async_pf(vcpu, gva, gfn, arch);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 68a3a06..8e2fc59 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6233,20 +6233,53 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, 
gfn_t gfn)
}
 }
 
+static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
+{
+
+   return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.apf.data, val,
+ sizeof(val));
+}
+
 void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work)
 {
-   trace_kvm_async_pf_not_present(work-gva);
-
-   kvm_make_request(KVM_REQ_APF_HALT, vcpu);
+   trace_kvm_async_pf_not_present(work-arch.token, work-gva);
kvm_add_async_pf_gfn(vcpu, work-arch.gfn);
+
+   if (!(vcpu-arch.apf.msr_val  KVM_ASYNC_PF_ENABLED) ||
+   kvm_x86_ops-get_cpl(vcpu) == 0)
+   kvm_make_request(KVM_REQ_APF_HALT, vcpu);
+   else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
+   vcpu-arch.fault.error_code = 0;
+   vcpu-arch.fault.address = work-arch.token;
+   kvm_inject_page_fault(vcpu);
+   }
 }
 
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work)
 {
-   trace_kvm_async_pf_ready(work-gva);
-   kvm_del_async_pf_gfn(vcpu, work-arch.gfn);
+   trace_kvm_async_pf_ready(work-arch.token, work-gva);
+   if (is_error_page(work-page))
+   work-arch.token = ~0; /* broadcast wakeup */
+   else
+   kvm_del_async_pf_gfn(vcpu, work-arch.gfn);
+
+   if ((vcpu-arch.apf.msr_val  KVM_ASYNC_PF_ENABLED) 
+   !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+   vcpu-arch.fault.error_code = 0;
+   vcpu-arch.fault.address = work-arch.token;
+   kvm_inject_page_fault(vcpu);
+   }
+}
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
+{
+   if (!(vcpu-arch.apf.msr_val  KVM_ASYNC_PF_ENABLED))
+   return true;
+   else
+   return !kvm_event_needs_reinjection(vcpu) 
+   kvm_x86_ops-interrupt_allowed(vcpu);
 }
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index a78a5e5..9c2cc6a 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -204,34 +204,39 @@ TRACE_EVENT(
 
 TRACE_EVENT(
kvm_async_pf_not_present,
-   TP_PROTO(u64 gva),
-   TP_ARGS(gva),
+   TP_PROTO(u64 token, u64 gva),
+   TP_ARGS(token, gva),
 
TP_STRUCT__entry(
+   __field(__u64, token)
__field(__u64, gva)

[PATCH v7 11/12] Let host know whether the guest can handle async PF in non-userspace context.

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 Documentation/kvm/msr.txt   |6 +++---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/include/asm/kvm_para.h |1 +
 arch/x86/kernel/kvm.c   |3 +++
 arch/x86/kvm/x86.c  |5 +++--
 5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/Documentation/kvm/msr.txt b/Documentation/kvm/msr.txt
index 27c11a6..d079aed 100644
--- a/Documentation/kvm/msr.txt
+++ b/Documentation/kvm/msr.txt
@@ -154,9 +154,10 @@ MSR_KVM_SYSTEM_TIME: 0x12
 MSR_KVM_ASYNC_PF_EN: 0x4b564d02
data: Bits 63-6 hold 64-byte aligned physical address of a
64 byte memory area which must be in guest RAM and must be
-   zeroed. Bits 5-1 are reserved and should be zero. Bit 0 is 1
+   zeroed. Bits 5-2 are reserved and should be zero. Bit 0 is 1
when asynchronous page faults are enabled on the vcpu 0 when
-   disabled.
+   disabled. Bit 2 is 1 if asynchronous page faults can be injected
+   when vcpu is in cpl == 0.
 
First 4 byte of 64 byte memory location will be written to by
the hypervisor at the time of asynchronous page fault (APF)
@@ -184,4 +185,3 @@ MSR_KVM_ASYNC_PF_EN: 0x4b564d02
 
Currently type 2 APF will be always delivered on the same vcpu as
type 1 was, but guest should not rely on that.
-
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f1868ed..d2fa951 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -422,6 +422,7 @@ struct kvm_vcpu_arch {
struct gfn_to_hva_cache data;
u64 msr_val;
u32 id;
+   bool send_user_only;
} apf;
 };
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index fbfd367..d3a1a48 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -38,6 +38,7 @@
 #define KVM_MAX_MMU_OP_BATCH   32
 
 #define KVM_ASYNC_PF_ENABLED   (1  0)
+#define KVM_ASYNC_PF_SEND_ALWAYS   (1  1)
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE1
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 47ea93e..91b3d65 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -449,6 +449,9 @@ void __cpuinit kvm_guest_cpu_init(void)
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF)  kvmapf) {
u64 pa = __pa(__get_cpu_var(apf_reason));
 
+#ifdef CONFIG_PREEMPT
+   pa |= KVM_ASYNC_PF_SEND_ALWAYS;
+#endif
wrmsrl(MSR_KVM_ASYNC_PF_EN, pa | KVM_ASYNC_PF_ENABLED);
__get_cpu_var(apf_reason).enabled = 1;
printk(KERN_INFOKVM setup async PF for cpu %d\n,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8e2fc59..1e442df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1435,8 +1435,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, 
u64 data)
 {
gpa_t gpa = data  ~0x3f;
 
-   /* Bits 1:5 are resrved, Should be zero */
-   if (data  0x3e)
+   /* Bits 2:5 are resrved, Should be zero */
+   if (data  0x3c)
return 1;
 
vcpu-arch.apf.msr_val = data;
@@ -1450,6 +1450,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, 
u64 data)
if (kvm_gfn_to_hva_cache_init(vcpu-kvm, vcpu-arch.apf.data, gpa))
return 1;
 
+   vcpu-arch.apf.send_user_only = !(data  KVM_ASYNC_PF_SEND_ALWAYS);
kvm_async_pf_wakeup_all(vcpu);
return 0;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 05/12] Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
into generic code.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_para.h |1 +
 arch/x86/kernel/kvm.c   |   11 +++
 arch/x86/kernel/kvmclock.c  |   13 +
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 7b562b6..e3faaaf 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,7 @@ struct kvm_mmu_op_release_pt {
 #include asm/processor.h
 
 extern void kvmclock_init(void);
+extern int kvm_register_clock(char *txt);
 
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 63b0ec8..e6db179 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -231,10 +231,21 @@ static void __init paravirt_ops_setup(void)
 #endif
 }
 
+#ifdef CONFIG_SMP
+static void __init kvm_smp_prepare_boot_cpu(void)
+{
+   WARN_ON(kvm_register_clock(primary cpu clock));
+   native_smp_prepare_boot_cpu();
+}
+#endif
+
 void __init kvm_guest_init(void)
 {
if (!kvm_para_available())
return;
 
paravirt_ops_setup();
+#ifdef CONFIG_SMP
+   smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+#endif
 }
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ca43ce3..f98d3ea 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -125,7 +125,7 @@ static struct clocksource kvm_clock = {
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
-static int kvm_register_clock(char *txt)
+int kvm_register_clock(char *txt)
 {
int cpu = smp_processor_id();
int low, high, ret;
@@ -152,14 +152,6 @@ static void __cpuinit kvm_setup_secondary_clock(void)
 }
 #endif
 
-#ifdef CONFIG_SMP
-static void __init kvm_smp_prepare_boot_cpu(void)
-{
-   WARN_ON(kvm_register_clock(primary cpu clock));
-   native_smp_prepare_boot_cpu();
-}
-#endif
-
 /*
  * After the clock is registered, the host will keep writing to the
  * registered memory location. If the guest happens to shutdown, this memory
@@ -206,9 +198,6 @@ void __init kvmclock_init(void)
x86_cpuinit.setup_percpu_clockev =
kvm_setup_secondary_clock;
 #endif
-#ifdef CONFIG_SMP
-   smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
-#endif
machine_ops.shutdown  = kvm_shutdown;
 #ifdef CONFIG_KEXEC
machine_ops.crash_shutdown  = kvm_crash_shutdown;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 03/12] Retry fault before vmentry

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

When page is swapped in it is mapped into guest memory only after guest
tries to access it again and generate another fault. To save this fault
we can map it immediately since we know that guest is going to access
the page. Do it only when tdp is enabled for now. Shadow paging case is
more complicated. CR[034] and EFER registers should be switched before
doing mapping and then switched back.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |4 +++-
 arch/x86/kvm/mmu.c  |   16 
 arch/x86/kvm/paging_tmpl.h  |6 +++---
 arch/x86/kvm/x86.c  |7 +++
 virt/kvm/async_pf.c |2 ++
 5 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 043e29e..96aca44 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -241,7 +241,7 @@ struct kvm_mmu {
void (*new_cr3)(struct kvm_vcpu *vcpu);
void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
-   int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
+   int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, bool 
no_apf);
void (*inject_page_fault)(struct kvm_vcpu *vcpu);
void (*free)(struct kvm_vcpu *vcpu);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
@@ -839,6 +839,8 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+  struct kvm_async_pf *work);
 extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f01e89a..11d152b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2568,7 +2568,7 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu 
*vcpu, gva_t vaddr,
 }
 
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
-   u32 error_code)
+   u32 error_code, bool no_apf)
 {
gfn_t gfn;
int r;
@@ -2604,8 +2604,8 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu)
return kvm_x86_ops-interrupt_allowed(vcpu);
 }
 
-static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
-pfn_t *pfn)
+static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn,
+gva_t gva, pfn_t *pfn)
 {
bool async;
 
@@ -2616,7 +2616,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t 
gfn, gva_t gva,
 
put_page(pfn_to_page(*pfn));
 
-   if (can_do_async_pf(vcpu)) {
+   if (!no_apf  can_do_async_pf(vcpu)) {
trace_kvm_try_async_get_page(async, *pfn);
if (kvm_find_async_pf_gfn(vcpu, gfn)) {
trace_kvm_async_pf_doublefault(gva, gfn);
@@ -2631,8 +2631,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t 
gfn, gva_t gva,
return false;
 }
 
-static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
-   u32 error_code)
+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
+ bool no_apf)
 {
pfn_t pfn;
int r;
@@ -2654,7 +2654,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa,
mmu_seq = vcpu-kvm-mmu_notifier_seq;
smp_rmb();
 
-   if (try_async_pf(vcpu, gfn, gpa, pfn))
+   if (try_async_pf(vcpu, no_apf, gfn, gpa, pfn))
return 0;
 
/* mmio */
@@ -3317,7 +3317,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u32 error_code)
int r;
enum emulation_result er;
 
-   r = vcpu-arch.mmu.page_fault(vcpu, cr2, error_code);
+   r = vcpu-arch.mmu.page_fault(vcpu, cr2, error_code, false);
if (r  0)
goto out;
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index c45376d..d6b281e 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -527,8 +527,8 @@ out_gpte_changed:
  *  Returns: 1 if we need to emulate the instruction, 0 otherwise, or
  *   a negative value on error.
  */
-static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
-  u32 error_code)
+static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
+bool no_apf)
 {
int write_fault = error_code  PFERR_WRITE_MASK;
int user_fault = error_code  PFERR_USER_MASK;
@@ -569,7 +569,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu,

[PATCH v7 01/12] Add get_user_pages() variant that fails if major fault is required.

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

This patch add get_user_pages() variant that only succeeds if getting
a reference to a page doesn't require major fault.

Reviewed-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 fs/ncpfs/mmap.c|2 ++
 include/linux/mm.h |5 +
 mm/filemap.c   |3 +++
 mm/memory.c|   31 ---
 mm/shmem.c |8 +++-
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/fs/ncpfs/mmap.c b/fs/ncpfs/mmap.c
index 56f5b3a..b9c4f36 100644
--- a/fs/ncpfs/mmap.c
+++ b/fs/ncpfs/mmap.c
@@ -39,6 +39,8 @@ static int ncp_file_mmap_fault(struct vm_area_struct *area,
int bufsize;
int pos; /* XXX: loff_t ? */
 
+   if (vmf-flags  FAULT_FLAG_MINOR)
+   return VM_FAULT_MAJOR | VM_FAULT_ERROR;
/*
 * ncpfs has nothing against high pages as long
 * as recvmsg and memset works on it
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 74949fb..da32900 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -144,6 +144,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_WRITE   0x01/* Fault was a write access */
 #define FAULT_FLAG_NONLINEAR   0x02/* Fault was via a nonlinear mapping */
 #define FAULT_FLAG_MKWRITE 0x04/* Fault was mkwrite of existing pte */
+#define FAULT_FLAG_MINOR   0x08/* Do only minor fault */
 
 /*
  * This interface is used by x86 PAT code to identify a pfn mapping that is
@@ -848,6 +849,9 @@ extern int access_process_vm(struct task_struct *tsk, 
unsigned long addr, void *
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int nr_pages, int write, int force,
struct page **pages, struct vm_area_struct **vmas);
+int get_user_pages_noio(struct task_struct *tsk, struct mm_struct *mm,
+   unsigned long start, int nr_pages, int write, int force,
+   struct page **pages, struct vm_area_struct **vmas);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
 struct page *get_dump_page(unsigned long addr);
@@ -1394,6 +1398,7 @@ struct page *follow_page(struct vm_area_struct *, 
unsigned long address,
 #define FOLL_GET   0x04/* do get_page on page */
 #define FOLL_DUMP  0x08/* give error on hole if it would be zero */
 #define FOLL_FORCE 0x10/* get_user_pages read/write w/o permission */
+#define FOLL_MINOR 0x20/* do only minor page faults */
 
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
void *data);
diff --git a/mm/filemap.c b/mm/filemap.c
index 3d4df44..ef28b6d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1548,6 +1548,9 @@ int filemap_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf)
goto no_cached_page;
}
} else {
+   if (vmf-flags  FAULT_FLAG_MINOR)
+   return VM_FAULT_MAJOR | VM_FAULT_ERROR;
+
/* No page in the page cache at all */
do_sync_mmap_readahead(vma, ra, file, offset);
count_vm_event(PGMAJFAULT);
diff --git a/mm/memory.c b/mm/memory.c
index 0e18b4d..b221458 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1441,10 +1441,13 @@ int __get_user_pages(struct task_struct *tsk, struct 
mm_struct *mm,
cond_resched();
while (!(page = follow_page(vma, start, foll_flags))) {
int ret;
+   unsigned int fault_fl =
+   ((foll_flags  FOLL_WRITE) ?
+   FAULT_FLAG_WRITE : 0) |
+   ((foll_flags  FOLL_MINOR) ?
+   FAULT_FLAG_MINOR : 0);
 
-   ret = handle_mm_fault(mm, vma, start,
-   (foll_flags  FOLL_WRITE) ?
-   FAULT_FLAG_WRITE : 0);
+   ret = handle_mm_fault(mm, vma, start, fault_fl);
 
if (ret  VM_FAULT_ERROR) {
if (ret  VM_FAULT_OOM)
@@ -1452,6 +1455,8 @@ int __get_user_pages(struct task_struct *tsk, struct 
mm_struct *mm,
if (ret 
(VM_FAULT_HWPOISON|VM_FAULT_SIGBUS))
return i ? i : -EFAULT;
+   else if (ret  VM_FAULT_MAJOR)
+   return i ? i : -EFAULT;
BUG();
}
if (ret  VM_FAULT_MAJOR)
@@ -1562,6 +1567,23 @@ int get_user_pages(struct

[PATCH v7 12/12] Send async PF when guest is not in userspace too.

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

If guest indicates that it can handle async pf in kernel mode too send
it, but only if interrupts are enabled.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/x86.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e442df..51cff2f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6248,7 +6248,8 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu 
*vcpu,
kvm_add_async_pf_gfn(vcpu, work-arch.gfn);
 
if (!(vcpu-arch.apf.msr_val  KVM_ASYNC_PF_ENABLED) ||
-   kvm_x86_ops-get_cpl(vcpu) == 0)
+   (vcpu-arch.apf.send_user_only 
+kvm_x86_ops-get_cpl(vcpu) == 0))
kvm_make_request(KVM_REQ_APF_HALT, vcpu);
else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
vcpu-arch.fault.error_code = 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.

Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |   18 
 arch/x86/kvm/Kconfig|1 +
 arch/x86/kvm/Makefile   |1 +
 arch/x86/kvm/mmu.c  |   52 +++-
 arch/x86/kvm/paging_tmpl.h  |4 +-
 arch/x86/kvm/x86.c  |  112 ++-
 include/linux/kvm_host.h|   31 +++
 include/trace/events/kvm.h  |   90 ++
 virt/kvm/Kconfig|3 +
 virt/kvm/async_pf.c |  190 +++
 virt/kvm/async_pf.h |   36 
 virt/kvm/kvm_main.c |   57 +---
 12 files changed, 578 insertions(+), 17 deletions(-)
 create mode 100644 virt/kvm/async_pf.c
 create mode 100644 virt/kvm/async_pf.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e209078..043e29e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -83,11 +83,14 @@
 #define KVM_NR_FIXED_MTRR_REGION 88
 #define KVM_NR_VAR_MTRR 8
 
+#define ASYNC_PF_PER_VCPU 64
+
 extern spinlock_t kvm_lock;
 extern struct list_head vm_list;
 
 struct kvm_vcpu;
 struct kvm;
+struct kvm_async_pf;
 
 enum kvm_reg {
VCPU_REGS_RAX = 0,
@@ -412,6 +415,11 @@ struct kvm_vcpu_arch {
u64 hv_vapic;
 
cpumask_var_t wbinvd_dirty_mask;
+
+   struct {
+   bool halted;
+   gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
+   } apf;
 };
 
 struct kvm_arch {
@@ -585,6 +593,10 @@ struct kvm_x86_ops {
const struct trace_print_flags *exit_reasons_str;
 };
 
+struct kvm_arch_async_pf {
+   gfn_t gfn;
+};
+
 extern struct kvm_x86_ops *kvm_x86_ops;
 
 int kvm_mmu_module_init(void);
@@ -823,4 +835,10 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ddc131f..50f6364 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -28,6 +28,7 @@ config KVM
select HAVE_KVM_IRQCHIP
select HAVE_KVM_EVENTFD
select KVM_APIC_ARCHITECTURE
+   select KVM_ASYNC_PF
select USER_RETURN_NOTIFIER
select KVM_MMIO
---help---
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 31a7035..c53bf19 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -9,6 +9,7 @@ kvm-y   += $(addprefix ../../../virt/kvm/, 
kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
assigned-dev.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
+kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
 kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
   i8254.o timer.o
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 908ea54..f01e89a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -18,9 +18,11 @@
  *
  */
 
+#include irq.h
 #include mmu.h
 #include x86.h
 #include kvm_cache_regs.h
+#include x86.h
 
 #include linux/kvm_host.h
 #include linux/types.h
@@ -2585,6 +2587,50 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 error_code  PFERR_WRITE_MASK, gfn);
 }
 
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
+{
+   struct kvm_arch_async_pf arch;
+   arch.gfn = gfn;
+
+   return kvm_setup_async_pf(vcpu, gva, gfn, arch);
+}
+
+static bool can_do_async_pf(struct kvm_vcpu *vcpu)
+{
+   if (unlikely(!irqchip_in_kernel(vcpu-kvm) ||
+kvm_event_needs_reinjection(vcpu)))
+   return false;
+
+   return kvm_x86_ops-interrupt_allowed(vcpu);
+}
+
+static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
+pfn_t *pfn)
+{
+   bool async;
+
+   *pfn = gfn_to_pfn_async(vcpu-kvm, gfn, async);
+
+   if (!async)
+   return false; /* *pfn has correct page already */
+
+   put_page(pfn_to_page(*pfn));
+
+   if (can_do_async_pf(vcpu)) {
+   trace_kvm_try_async_get_page(async, *pfn);
+

[PATCH v7 00/12] KVM: Add host swap event notifications for PV guest

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

KVM virtualizes guest memory by means of shadow pages or HW assistance
like NPT/EPT. Not all memory used by a guest is mapped into the guest
address space or even present in a host memory at any given time.
When vcpu tries to access memory page that is not mapped into the guest
address space KVM is notified about it. KVM maps the page into the guest
address space and resumes vcpu execution. If the page is swapped out from
the host memory vcpu execution is suspended till the page is swapped
into the memory again. This is inefficient since vcpu can do other work
(run other task or serve interrupts) while page gets swapped in.

The patch series tries to mitigate this problem by introducing two
mechanisms. The first one is used with non-PV guest and it works like
this: when vcpu tries to access swapped out page it is halted and
requested page is swapped in by another thread. That way vcpu can still
process interrupts while io is happening in parallel and, with any luck,
interrupt will cause the guest to schedule another task on the vcpu, so
it will have work to do instead of waiting for the page to be swapped in.

The second mechanism introduces PV notification about swapped page state to
a guest (asynchronous page fault). Instead of halting vcpu upon access to
swapped out page and hoping that some interrupt will cause reschedule we
immediately inject asynchronous page fault to the vcpu.  PV aware guest
knows that upon receiving such exception it should schedule another task
to run on the vcpu. Current task is put to sleep until another kind of
asynchronous page fault is received that notifies the guest that page
is now in the host memory, so task that waits for it can run again.

To measure performance benefits I use a simple benchmark program (below)
that starts number of threads. Some of them do work (increment counter),
others access huge array in random location trying to generate host page
faults. The size of the array is smaller then guest memory bug bigger
then host memory so we are guarantied that host will swap out part of
the array.

I ran the benchmark on three setups: with current kvm.git (master),
with my patch series + non-pv guest (nonpv) and with my patch series +
pv guest (pv).

Each guest had 4 cpus and 2G memory and was launched inside 512M memory
container. The command line was ./bm -f 4 -w 4 -t 60 (run 4 faulting
threads and 4 working threads for a minute).

Below is the total amount of work each guest managed to do
(average of 10 runs):
 total workstd error
master: 122789420615 (3818565029)
nonpv:  138455939001 (773774299)
pv: 234351846135 (10461117116)

Changes:
 v1-v2
   Use MSR instead of hypercall.
   Move most of the code into arch independent place.
   halt inside a guest instead of doing wait for page hypercall if
preemption is disabled.
 v2-v3
   Use MSR from range 0x4b564dxx.
   Add slot version tracking.
   Support migration by restarting all guest processes after migration.
   Drop patch that tract preemptability for non-preemptable kernels
due to performance concerns. Send async PF to non-preemptable
guests only when vcpu is executing userspace code.
 v3-v4
  Provide alternative page fault handler in PV guest instead of adding hook to
   standard page fault handler and patch it out on non-PV guests.
  Allow only limited number of outstanding async page fault per vcpu.
  Unify  gfn_to_pfn and gfn_to_pfn_async code.
  Cancel outstanding slow work on reset.
 v4-v5
  Move async pv cpu initialization into cpu hotplug notifier.
  Use GFP_NOWAIT instead of GFP_ATOMIC for allocation that shouldn't sleep
  Process KVM_REQ_MMU_SYNC even in page_fault_other_cr3() before changing
   cr3 back
 v5-v6
  To many. Will list only major changes here.
  Replace slow work with work queues.
  Halt vcpu for non-pv guests.
  Handle async PF in nested SVM mode.
  Do not prefault swapped in page for non tdp case.
 v6-v7
  Fix GUP fail in work thread problem
  Do prefault only if mmu is in direct map mode
  Use cpu-request to ask for vcpu halt (drop optimization that tried to
   skip non-present apf injection if page is swapped in before next vmentry)
  Keep track of synthetic halt in separate state to prevent it from leaking
   during migration.
  Fix memslot tracking problems.
  More documentation.
  Other small comments are addressed

Gleb Natapov (12):
  Add get_user_pages() variant that fails if major fault is required.
  Halt vcpu if page it tries to access is swapped out.
  Retry fault before vmentry
  Add memory slot versioning and use it to provide fast guest write interface
  Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
  Add PV MSR to enable asynchronous page faults delivery.
  Add async PF initialization to PV guest.
  Handle async PF in a guest.
  Inject asynchronous page fault into a PV guest if page is swapped out.
  Handle async PF in non preemptable context
  Let host know whether the guest can handle async

[PATCH v7 08/12] Handle async PF in a guest.

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler. Also add async PF handling to nested SVM
emulation. Async PF always generates exit to L1 where vcpu thread will
be scheduled out until page is available.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_para.h |   12 +++
 arch/x86/include/asm/traps.h|1 +
 arch/x86/kernel/entry_32.S  |   10 ++
 arch/x86/kernel/entry_64.S  |3 +
 arch/x86/kernel/kvm.c   |  181 +++
 arch/x86/kvm/svm.c  |   45 --
 6 files changed, 243 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 2315398..fbfd367 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,9 @@ struct kvm_mmu_op_release_pt {
__u64 pt_phys;
 };
 
+#define KVM_PV_REASON_PAGE_NOT_PRESENT 1
+#define KVM_PV_REASON_PAGE_READY 2
+
 struct kvm_vcpu_pv_apf_data {
__u32 reason;
__u8 pad[60];
@@ -171,8 +174,17 @@ static inline unsigned int kvm_arch_para_features(void)
 
 #ifdef CONFIG_KVM_GUEST
 void __init kvm_guest_init(void);
+void kvm_async_pf_task_wait(u32 token);
+void kvm_async_pf_task_wake(u32 token);
+u32 kvm_read_and_reset_pf_reason(void);
 #else
 #define kvm_guest_init() do { } while (0)
+#define kvm_async_pf_task_wait(T) do {} while(0)
+#define kvm_async_pf_task_wake(T) do {} while(0)
+static u32 kvm_read_and_reset_pf_reason(void)
+{
+   return 0;
+}
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index f66cda5..0310da6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -30,6 +30,7 @@ asmlinkage void segment_not_present(void);
 asmlinkage void stack_segment(void);
 asmlinkage void general_protection(void);
 asmlinkage void page_fault(void);
+asmlinkage void async_page_fault(void);
 asmlinkage void spurious_interrupt_bug(void);
 asmlinkage void coprocessor_error(void);
 asmlinkage void alignment_check(void);
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 227d009..e6e7273 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1496,6 +1496,16 @@ ENTRY(general_protection)
CFI_ENDPROC
 END(general_protection)
 
+#ifdef CONFIG_KVM_GUEST
+ENTRY(async_page_fault)
+   RING0_EC_FRAME
+   pushl $do_async_page_fault
+   CFI_ADJUST_CFA_OFFSET 4
+   jmp error_code
+   CFI_ENDPROC
+END(apf_page_fault)
+#endif
+
 /*
  * End of kprobes section
  */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 17be5ec..def98c3 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1349,6 +1349,9 @@ errorentry xen_stack_segment do_stack_segment
 #endif
 errorentry general_protection do_general_protection
 errorentry page_fault do_page_fault
+#ifdef CONFIG_KVM_GUEST
+errorentry async_page_fault do_async_page_fault
+#endif
 #ifdef CONFIG_X86_MCE
 paranoidzeroentry machine_check *machine_check_vector(%rip)
 #endif
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 032d03b..d564063 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -29,8 +29,14 @@
 #include linux/hardirq.h
 #include linux/notifier.h
 #include linux/reboot.h
+#include linux/hash.h
+#include linux/sched.h
+#include linux/slab.h
+#include linux/kprobes.h
 #include asm/timer.h
 #include asm/cpu.h
+#include asm/traps.h
+#include asm/desc.h
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -64,6 +70,168 @@ static void kvm_io_delay(void)
 {
 }
 
+#define KVM_TASK_SLEEP_HASHBITS 8
+#define KVM_TASK_SLEEP_HASHSIZE (1KVM_TASK_SLEEP_HASHBITS)
+
+struct kvm_task_sleep_node {
+   struct hlist_node link;
+   wait_queue_head_t wq;
+   u32 token;
+   int cpu;
+};
+
+static struct kvm_task_sleep_head {
+   spinlock_t lock;
+   struct hlist_head list;
+} async_pf_sleepers[KVM_TASK_SLEEP_HASHSIZE];
+
+static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head 
*b,
+ u32 token)
+{
+   struct hlist_node *p;
+
+   hlist_for_each(p, b-list) {
+   struct kvm_task_sleep_node *n =
+   hlist_entry(p, typeof(*n), link);
+   if (n-token == token)
+   return n;
+   }
+
+   return NULL;
+}
+
+void kvm_async_pf_task_wait(u32 token)
+{
+   u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
+   struct kvm_task_sleep_head *b = async_pf_sleepers[key];
+   struct kvm_task_sleep_node n, *e;
+   DEFINE_WAIT(wait);
+
+   spin_lock(b-lock);
+   e = _find_apf_task(b, token);
+   if (e) {
+   /* dummy entry exist - wake up was delivered ahead of PF */
+

[PATCH v7 10/12] Handle async PF in non preemptable context

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kernel/kvm.c |   40 ++--
 1 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index d564063..47ea93e 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -37,6 +37,7 @@
 #include asm/cpu.h
 #include asm/traps.h
 #include asm/desc.h
+#include asm/tlbflush.h
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -78,6 +79,8 @@ struct kvm_task_sleep_node {
wait_queue_head_t wq;
u32 token;
int cpu;
+   bool halted;
+   struct mm_struct *mm;
 };
 
 static struct kvm_task_sleep_head {
@@ -106,6 +109,11 @@ void kvm_async_pf_task_wait(u32 token)
struct kvm_task_sleep_head *b = async_pf_sleepers[key];
struct kvm_task_sleep_node n, *e;
DEFINE_WAIT(wait);
+   int cpu, idle;
+
+   cpu = get_cpu();
+   idle = idle_cpu(cpu);
+   put_cpu();
 
spin_lock(b-lock);
e = _find_apf_task(b, token);
@@ -119,19 +127,33 @@ void kvm_async_pf_task_wait(u32 token)
 
n.token = token;
n.cpu = smp_processor_id();
+   n.mm = current-active_mm;
+   n.halted = idle || preempt_count()  1;
+   atomic_inc(n.mm-mm_count);
init_waitqueue_head(n.wq);
hlist_add_head(n.link, b-list);
spin_unlock(b-lock);
 
for (;;) {
-   prepare_to_wait(n.wq, wait, TASK_UNINTERRUPTIBLE);
+   if (!n.halted)
+   prepare_to_wait(n.wq, wait, TASK_UNINTERRUPTIBLE);
if (hlist_unhashed(n.link))
break;
-   local_irq_enable();
-   schedule();
-   local_irq_disable();
+
+   if (!n.halted) {
+   local_irq_enable();
+   schedule();
+   local_irq_disable();
+   } else {
+   /*
+* We cannot reschedule. So halt.
+*/
+   native_safe_halt();
+   local_irq_disable();
+   }
}
-   finish_wait(n.wq, wait);
+   if (!n.halted)
+   finish_wait(n.wq, wait);
 
return;
 }
@@ -140,7 +162,12 @@ EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
 static void apf_task_wake_one(struct kvm_task_sleep_node *n)
 {
hlist_del_init(n-link);
-   if (waitqueue_active(n-wq))
+   if (!n-mm)
+   return;
+   mmdrop(n-mm);
+   if (n-halted)
+   smp_send_reschedule(n-cpu);
+   else if (waitqueue_active(n-wq))
wake_up(n-wq);
 }
 
@@ -193,6 +220,7 @@ again:
}
n-token = token;
n-cpu = smp_processor_id();
+   n-mm = NULL;
init_waitqueue_head(n-wq);
hlist_add_head(n-link, b-list);
} else
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 06/12] Add PV MSR to enable asynchronous page faults delivery.

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

Guest enables async PF vcpu functionality using this MSR.

Reviewed-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 Documentation/kvm/cpuid.txt |3 +++
 Documentation/kvm/msr.txt   |   36 +++-
 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/include/asm/kvm_para.h |4 
 arch/x86/kvm/x86.c  |   38 --
 include/linux/kvm.h |1 +
 include/linux/kvm_host.h|1 +
 virt/kvm/async_pf.c |   20 
 8 files changed, 102 insertions(+), 3 deletions(-)

diff --git a/Documentation/kvm/cpuid.txt b/Documentation/kvm/cpuid.txt
index 14a12ea..8820685 100644
--- a/Documentation/kvm/cpuid.txt
+++ b/Documentation/kvm/cpuid.txt
@@ -36,6 +36,9 @@ KVM_FEATURE_MMU_OP || 2 || deprecated.
 KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock available at msrs
||   || 0x4b564d00 and 0x4b564d01
 --
+KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
+   ||   || writing to msr 0x4b564d02
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/Documentation/kvm/msr.txt b/Documentation/kvm/msr.txt
index 8ddcfe8..27c11a6 100644
--- a/Documentation/kvm/msr.txt
+++ b/Documentation/kvm/msr.txt
@@ -3,7 +3,6 @@ Glauber Costa glom...@redhat.com, Red Hat Inc, 2010
 =
 
 KVM makes use of some custom MSRs to service some requests.
-At present, this facility is only used by kvmclock.
 
 Custom MSRs have a range reserved for them, that goes from
 0x4b564d00 to 0x4b564dff. There are MSRs outside this area,
@@ -151,3 +150,38 @@ MSR_KVM_SYSTEM_TIME: 0x12
return PRESENT;
} else
return NON_PRESENT;
+
+MSR_KVM_ASYNC_PF_EN: 0x4b564d02
+   data: Bits 63-6 hold 64-byte aligned physical address of a
+   64 byte memory area which must be in guest RAM and must be
+   zeroed. Bits 5-1 are reserved and should be zero. Bit 0 is 1
+   when asynchronous page faults are enabled on the vcpu 0 when
+   disabled.
+
+   First 4 byte of 64 byte memory location will be written to by
+   the hypervisor at the time of asynchronous page fault (APF)
+   injection to indicate type of asynchronous page fault. Value
+   of 1 means that the page referred to by the page fault is not
+   present. Value 2 means that the page is now available. Disabling
+   interrupt inhibits APFs. Guest must not enable interrupt
+   before the reason is read, or it may be overwritten by another
+   APF. Since APF uses the same exception vector as regular page
+   fault guest must reset the reason to 0 before it does
+   something that can generate normal page fault.  If during page
+   fault APF reason is 0 it means that this is regular page
+   fault.
+
+   During delivery of type 1 APF cr2 contains a token that will
+   be used to notify a guest when missing page becomes
+   available. When page becomes available type 2 APF is sent with
+   cr2 set to the token associated with the page. There is special
+   kind of token 0x which tells vcpu that it should wake
+   up all processes waiting for APFs and no individual type 2 APFs
+   will be sent.
+
+   If APF is disabled while there are outstanding APFs, they will
+   not be delivered.
+
+   Currently type 2 APF will be always delivered on the same vcpu as
+   type 1 was, but guest should not rely on that.
+
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 96aca44..26b2064 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -419,6 +419,8 @@ struct kvm_vcpu_arch {
struct {
bool halted;
gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
+   struct gfn_to_hva_cache data;
+   u64 msr_val;
} apf;
 };
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index e3faaaf..8662ae0 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -20,6 +20,7 @@
  * are available. The use of 0x11 and 0x12 is deprecated
  */
 #define KVM_FEATURE_CLOCKSOURCE23
+#define KVM_FEATURE_ASYNC_PF   4
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -32,9 +33,12 @@
 /* Custom

[PATCH v7 04/12] Add memory slot versioning and use it to provide fast guest write interface

2010-10-14 Thread y

From: Gleb Natapov g...@redhat.com

Keep track of memslots changes by keeping generation number in memslots
structure. Provide kvm_write_guest_cached() function that skips
gfn_to_hva() translation if memslots was not changed since previous
invocation.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 include/linux/kvm_host.h  |7 
 include/linux/kvm_types.h |7 
 virt/kvm/kvm_main.c   |   75 +---
 3 files changed, 77 insertions(+), 12 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9a9b017..dda88f2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -199,6 +199,7 @@ struct kvm_irq_routing_table {};
 
 struct kvm_memslots {
int nmemslots;
+   u64 generation;
struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS +
KVM_PRIVATE_MEM_SLOTS];
 };
@@ -352,12 +353,18 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, 
const void *data,
 int offset, int len);
 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
unsigned long len);
+int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+  void *data, unsigned long len);
+int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+ gpa_t gpa);
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
+void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
+gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 7ac0d4e..fa7cc72 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -67,4 +67,11 @@ struct kvm_lapic_irq {
u32 dest_id;
 };
 
+struct gfn_to_hva_cache {
+   u64 generation;
+   gpa_t gpa;
+   unsigned long hva;
+   struct kvm_memory_slot *memslot;
+};
+
 #endif /* __KVM_TYPES_H__ */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 238079e..5d57ec9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -687,6 +687,7 @@ skip_lpage:
memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
if (mem-slot = slots-nmemslots)
slots-nmemslots = mem-slot + 1;
+   slots-generation++;
slots-memslots[mem-slot].flags |= KVM_MEMSLOT_INVALID;
 
old_memslots = kvm-memslots;
@@ -723,6 +724,7 @@ skip_lpage:
memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
if (mem-slot = slots-nmemslots)
slots-nmemslots = mem-slot + 1;
+   slots-generation++;
 
/* actual memory is freed via old in kvm_free_physmem_slot below */
if (!npages) {
@@ -853,10 +855,10 @@ int kvm_is_error_hva(unsigned long addr)
 }
 EXPORT_SYMBOL_GPL(kvm_is_error_hva);
 
-struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
+static struct kvm_memory_slot *__gfn_to_memslot(struct kvm_memslots *slots,
+   gfn_t gfn)
 {
int i;
-   struct kvm_memslots *slots = kvm_memslots(kvm);
 
for (i = 0; i  slots-nmemslots; ++i) {
struct kvm_memory_slot *memslot = slots-memslots[i];
@@ -867,6 +869,11 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, 
gfn_t gfn)
}
return NULL;
 }
+
+struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
+{
+   return __gfn_to_memslot(kvm_memslots(kvm), gfn);
+}
 EXPORT_SYMBOL_GPL(gfn_to_memslot);
 
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
@@ -929,12 +936,9 @@ int memslot_id(struct kvm *kvm, gfn_t gfn)
return memslot - slots-memslots;
 }
 
-static unsigned long gfn_to_hva_many(struct kvm *kvm, gfn_t gfn,
+static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
 gfn_t *nr_pages)
 {
-   struct kvm_memory_slot *slot;
-
-   slot = gfn_to_memslot(kvm, gfn);
if (!slot || slot-flags  KVM_MEMSLOT_INVALID)
return bad_hva();
 
@@ -946,7 +950,7 @@ static unsigned long gfn_to_hva_many(struct kvm *kvm, gfn_t 
gfn,
 
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 {
-   return gfn_to_hva_many(kvm, gfn, NULL);
+   return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_hva);
 
@@ -1063,7 +1067,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, 
struct page

Re: [PATCH v7 00/12] KVM: Add host swap event notifications for PV guest

2010-10-14 Thread Gleb Natapov

Ignore this please. Something bad happened to From: header.

On Thu, Oct 14, 2010 at 11:16:58AM +0200, y...@redhat.com wrote:
 From: Gleb Natapov g...@redhat.com
 
 KVM virtualizes guest memory by means of shadow pages or HW assistance
 like NPT/EPT. Not all memory used by a guest is mapped into the guest
 address space or even present in a host memory at any given time.
 When vcpu tries to access memory page that is not mapped into the guest
 address space KVM is notified about it. KVM maps the page into the guest
 address space and resumes vcpu execution. If the page is swapped out from
 the host memory vcpu execution is suspended till the page is swapped
 into the memory again. This is inefficient since vcpu can do other work
 (run other task or serve interrupts) while page gets swapped in.
 
 The patch series tries to mitigate this problem by introducing two
 mechanisms. The first one is used with non-PV guest and it works like
 this: when vcpu tries to access swapped out page it is halted and
 requested page is swapped in by another thread. That way vcpu can still
 process interrupts while io is happening in parallel and, with any luck,
 interrupt will cause the guest to schedule another task on the vcpu, so
 it will have work to do instead of waiting for the page to be swapped in.
 
 The second mechanism introduces PV notification about swapped page state to
 a guest (asynchronous page fault). Instead of halting vcpu upon access to
 swapped out page and hoping that some interrupt will cause reschedule we
 immediately inject asynchronous page fault to the vcpu.  PV aware guest
 knows that upon receiving such exception it should schedule another task
 to run on the vcpu. Current task is put to sleep until another kind of
 asynchronous page fault is received that notifies the guest that page
 is now in the host memory, so task that waits for it can run again.
 
 To measure performance benefits I use a simple benchmark program (below)
 that starts number of threads. Some of them do work (increment counter),
 others access huge array in random location trying to generate host page
 faults. The size of the array is smaller then guest memory bug bigger
 then host memory so we are guarantied that host will swap out part of
 the array.
 
 I ran the benchmark on three setups: with current kvm.git (master),
 with my patch series + non-pv guest (nonpv) and with my patch series +
 pv guest (pv).
 
 Each guest had 4 cpus and 2G memory and was launched inside 512M memory
 container. The command line was ./bm -f 4 -w 4 -t 60 (run 4 faulting
 threads and 4 working threads for a minute).
 
 Below is the total amount of work each guest managed to do
 (average of 10 runs):
  total workstd error
 master: 122789420615 (3818565029)
 nonpv:  138455939001 (773774299)
 pv: 234351846135 (10461117116)
 
 Changes:
  v1-v2
Use MSR instead of hypercall.
Move most of the code into arch independent place.
halt inside a guest instead of doing wait for page hypercall if
 preemption is disabled.
  v2-v3
Use MSR from range 0x4b564dxx.
Add slot version tracking.
Support migration by restarting all guest processes after migration.
Drop patch that tract preemptability for non-preemptable kernels
 due to performance concerns. Send async PF to non-preemptable
 guests only when vcpu is executing userspace code.
  v3-v4
   Provide alternative page fault handler in PV guest instead of adding hook to
standard page fault handler and patch it out on non-PV guests.
   Allow only limited number of outstanding async page fault per vcpu.
   Unify  gfn_to_pfn and gfn_to_pfn_async code.
   Cancel outstanding slow work on reset.
  v4-v5
   Move async pv cpu initialization into cpu hotplug notifier.
   Use GFP_NOWAIT instead of GFP_ATOMIC for allocation that shouldn't sleep
   Process KVM_REQ_MMU_SYNC even in page_fault_other_cr3() before changing
cr3 back
  v5-v6
   To many. Will list only major changes here.
   Replace slow work with work queues.
   Halt vcpu for non-pv guests.
   Handle async PF in nested SVM mode.
   Do not prefault swapped in page for non tdp case.
  v6-v7
   Fix GUP fail in work thread problem
   Do prefault only if mmu is in direct map mode
   Use cpu-request to ask for vcpu halt (drop optimization that tried to
skip non-present apf injection if page is swapped in before next vmentry)
   Keep track of synthetic halt in separate state to prevent it from leaking
during migration.
   Fix memslot tracking problems.
   More documentation.
   Other small comments are addressed
 
 Gleb Natapov (12):
   Add get_user_pages() variant that fails if major fault is required.
   Halt vcpu if page it tries to access is swapped out.
   Retry fault before vmentry
   Add memory slot versioning and use it to provide fast guest write interface
   Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
   Add PV MSR to enable asynchronous page faults delivery.
   Add async PF

[PATCH v7 12/12] Send async PF when guest is not in userspace too.

2010-10-14 Thread Gleb Natapov

If guest indicates that it can handle async pf in kernel mode too send
it, but only if interrupts are enabled.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/x86.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e442df..51cff2f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6248,7 +6248,8 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu 
*vcpu,
kvm_add_async_pf_gfn(vcpu, work-arch.gfn);
 
if (!(vcpu-arch.apf.msr_val  KVM_ASYNC_PF_ENABLED) ||
-   kvm_x86_ops-get_cpl(vcpu) == 0)
+   (vcpu-arch.apf.send_user_only 
+kvm_x86_ops-get_cpl(vcpu) == 0))
kvm_make_request(KVM_REQ_APF_HALT, vcpu);
else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
vcpu-arch.fault.error_code = 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 10/12] Handle async PF in non preemptable context

2010-10-14 Thread Gleb Natapov

If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kernel/kvm.c |   40 ++--
 1 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index d564063..47ea93e 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -37,6 +37,7 @@
 #include asm/cpu.h
 #include asm/traps.h
 #include asm/desc.h
+#include asm/tlbflush.h
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -78,6 +79,8 @@ struct kvm_task_sleep_node {
wait_queue_head_t wq;
u32 token;
int cpu;
+   bool halted;
+   struct mm_struct *mm;
 };
 
 static struct kvm_task_sleep_head {
@@ -106,6 +109,11 @@ void kvm_async_pf_task_wait(u32 token)
struct kvm_task_sleep_head *b = async_pf_sleepers[key];
struct kvm_task_sleep_node n, *e;
DEFINE_WAIT(wait);
+   int cpu, idle;
+
+   cpu = get_cpu();
+   idle = idle_cpu(cpu);
+   put_cpu();
 
spin_lock(b-lock);
e = _find_apf_task(b, token);
@@ -119,19 +127,33 @@ void kvm_async_pf_task_wait(u32 token)
 
n.token = token;
n.cpu = smp_processor_id();
+   n.mm = current-active_mm;
+   n.halted = idle || preempt_count()  1;
+   atomic_inc(n.mm-mm_count);
init_waitqueue_head(n.wq);
hlist_add_head(n.link, b-list);
spin_unlock(b-lock);
 
for (;;) {
-   prepare_to_wait(n.wq, wait, TASK_UNINTERRUPTIBLE);
+   if (!n.halted)
+   prepare_to_wait(n.wq, wait, TASK_UNINTERRUPTIBLE);
if (hlist_unhashed(n.link))
break;
-   local_irq_enable();
-   schedule();
-   local_irq_disable();
+
+   if (!n.halted) {
+   local_irq_enable();
+   schedule();
+   local_irq_disable();
+   } else {
+   /*
+* We cannot reschedule. So halt.
+*/
+   native_safe_halt();
+   local_irq_disable();
+   }
}
-   finish_wait(n.wq, wait);
+   if (!n.halted)
+   finish_wait(n.wq, wait);
 
return;
 }
@@ -140,7 +162,12 @@ EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
 static void apf_task_wake_one(struct kvm_task_sleep_node *n)
 {
hlist_del_init(n-link);
-   if (waitqueue_active(n-wq))
+   if (!n-mm)
+   return;
+   mmdrop(n-mm);
+   if (n-halted)
+   smp_send_reschedule(n-cpu);
+   else if (waitqueue_active(n-wq))
wake_up(n-wq);
 }
 
@@ -193,6 +220,7 @@ again:
}
n-token = token;
n-cpu = smp_processor_id();
+   n-mm = NULL;
init_waitqueue_head(n-wq);
hlist_add_head(n-link, b-list);
} else
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 08/12] Handle async PF in a guest.

2010-10-14 Thread Gleb Natapov

When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler. Also add async PF handling to nested SVM
emulation. Async PF always generates exit to L1 where vcpu thread will
be scheduled out until page is available.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_para.h |   12 +++
 arch/x86/include/asm/traps.h|1 +
 arch/x86/kernel/entry_32.S  |   10 ++
 arch/x86/kernel/entry_64.S  |3 +
 arch/x86/kernel/kvm.c   |  181 +++
 arch/x86/kvm/svm.c  |   45 --
 6 files changed, 243 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 2315398..fbfd367 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,9 @@ struct kvm_mmu_op_release_pt {
__u64 pt_phys;
 };
 
+#define KVM_PV_REASON_PAGE_NOT_PRESENT 1
+#define KVM_PV_REASON_PAGE_READY 2
+
 struct kvm_vcpu_pv_apf_data {
__u32 reason;
__u8 pad[60];
@@ -171,8 +174,17 @@ static inline unsigned int kvm_arch_para_features(void)
 
 #ifdef CONFIG_KVM_GUEST
 void __init kvm_guest_init(void);
+void kvm_async_pf_task_wait(u32 token);
+void kvm_async_pf_task_wake(u32 token);
+u32 kvm_read_and_reset_pf_reason(void);
 #else
 #define kvm_guest_init() do { } while (0)
+#define kvm_async_pf_task_wait(T) do {} while(0)
+#define kvm_async_pf_task_wake(T) do {} while(0)
+static u32 kvm_read_and_reset_pf_reason(void)
+{
+   return 0;
+}
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index f66cda5..0310da6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -30,6 +30,7 @@ asmlinkage void segment_not_present(void);
 asmlinkage void stack_segment(void);
 asmlinkage void general_protection(void);
 asmlinkage void page_fault(void);
+asmlinkage void async_page_fault(void);
 asmlinkage void spurious_interrupt_bug(void);
 asmlinkage void coprocessor_error(void);
 asmlinkage void alignment_check(void);
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 227d009..e6e7273 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1496,6 +1496,16 @@ ENTRY(general_protection)
CFI_ENDPROC
 END(general_protection)
 
+#ifdef CONFIG_KVM_GUEST
+ENTRY(async_page_fault)
+   RING0_EC_FRAME
+   pushl $do_async_page_fault
+   CFI_ADJUST_CFA_OFFSET 4
+   jmp error_code
+   CFI_ENDPROC
+END(apf_page_fault)
+#endif
+
 /*
  * End of kprobes section
  */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 17be5ec..def98c3 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1349,6 +1349,9 @@ errorentry xen_stack_segment do_stack_segment
 #endif
 errorentry general_protection do_general_protection
 errorentry page_fault do_page_fault
+#ifdef CONFIG_KVM_GUEST
+errorentry async_page_fault do_async_page_fault
+#endif
 #ifdef CONFIG_X86_MCE
 paranoidzeroentry machine_check *machine_check_vector(%rip)
 #endif
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 032d03b..d564063 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -29,8 +29,14 @@
 #include linux/hardirq.h
 #include linux/notifier.h
 #include linux/reboot.h
+#include linux/hash.h
+#include linux/sched.h
+#include linux/slab.h
+#include linux/kprobes.h
 #include asm/timer.h
 #include asm/cpu.h
+#include asm/traps.h
+#include asm/desc.h
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -64,6 +70,168 @@ static void kvm_io_delay(void)
 {
 }
 
+#define KVM_TASK_SLEEP_HASHBITS 8
+#define KVM_TASK_SLEEP_HASHSIZE (1KVM_TASK_SLEEP_HASHBITS)
+
+struct kvm_task_sleep_node {
+   struct hlist_node link;
+   wait_queue_head_t wq;
+   u32 token;
+   int cpu;
+};
+
+static struct kvm_task_sleep_head {
+   spinlock_t lock;
+   struct hlist_head list;
+} async_pf_sleepers[KVM_TASK_SLEEP_HASHSIZE];
+
+static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head 
*b,
+ u32 token)
+{
+   struct hlist_node *p;
+
+   hlist_for_each(p, b-list) {
+   struct kvm_task_sleep_node *n =
+   hlist_entry(p, typeof(*n), link);
+   if (n-token == token)
+   return n;
+   }
+
+   return NULL;
+}
+
+void kvm_async_pf_task_wait(u32 token)
+{
+   u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
+   struct kvm_task_sleep_head *b = async_pf_sleepers[key];
+   struct kvm_task_sleep_node n, *e;
+   DEFINE_WAIT(wait);
+
+   spin_lock(b-lock);
+   e = _find_apf_task(b, token);
+   if (e) {
+   /* dummy entry exist - wake up was delivered ahead of PF */
+   hlist_del(e-link);
+

[PATCH v7 09/12] Inject asynchronous page fault into a PV guest if page is swapped out.

2010-10-14 Thread Gleb Natapov

Send async page fault to a PV guest if it accesses swapped out memory.
Guest will choose another task to run upon receiving the fault.

Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able
to reschedule.

Vcpu will be halted if guest will fault on the same page again or if
vcpu executes kernel code.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |3 ++
 arch/x86/kvm/mmu.c  |1 +
 arch/x86/kvm/x86.c  |   43 ++
 include/trace/events/kvm.h  |   17 ++-
 virt/kvm/async_pf.c |3 +-
 5 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 26b2064..f1868ed 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -421,6 +421,7 @@ struct kvm_vcpu_arch {
gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
struct gfn_to_hva_cache data;
u64 msr_val;
+   u32 id;
} apf;
 };
 
@@ -596,6 +597,7 @@ struct kvm_x86_ops {
 };
 
 struct kvm_arch_async_pf {
+   u32 token;
gfn_t gfn;
 };
 
@@ -843,6 +845,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
 void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
   struct kvm_async_pf *work);
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
 extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 11d152b..463ff2e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2590,6 +2590,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
 {
struct kvm_arch_async_pf arch;
+   arch.token = (vcpu-arch.apf.id++  12) | vcpu-vcpu_id;
arch.gfn = gfn;
 
return kvm_setup_async_pf(vcpu, gva, gfn, arch);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 68a3a06..8e2fc59 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6233,20 +6233,53 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, 
gfn_t gfn)
}
 }
 
+static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
+{
+
+   return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.apf.data, val,
+ sizeof(val));
+}
+
 void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work)
 {
-   trace_kvm_async_pf_not_present(work-gva);
-
-   kvm_make_request(KVM_REQ_APF_HALT, vcpu);
+   trace_kvm_async_pf_not_present(work-arch.token, work-gva);
kvm_add_async_pf_gfn(vcpu, work-arch.gfn);
+
+   if (!(vcpu-arch.apf.msr_val  KVM_ASYNC_PF_ENABLED) ||
+   kvm_x86_ops-get_cpl(vcpu) == 0)
+   kvm_make_request(KVM_REQ_APF_HALT, vcpu);
+   else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
+   vcpu-arch.fault.error_code = 0;
+   vcpu-arch.fault.address = work-arch.token;
+   kvm_inject_page_fault(vcpu);
+   }
 }
 
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work)
 {
-   trace_kvm_async_pf_ready(work-gva);
-   kvm_del_async_pf_gfn(vcpu, work-arch.gfn);
+   trace_kvm_async_pf_ready(work-arch.token, work-gva);
+   if (is_error_page(work-page))
+   work-arch.token = ~0; /* broadcast wakeup */
+   else
+   kvm_del_async_pf_gfn(vcpu, work-arch.gfn);
+
+   if ((vcpu-arch.apf.msr_val  KVM_ASYNC_PF_ENABLED) 
+   !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+   vcpu-arch.fault.error_code = 0;
+   vcpu-arch.fault.address = work-arch.token;
+   kvm_inject_page_fault(vcpu);
+   }
+}
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
+{
+   if (!(vcpu-arch.apf.msr_val  KVM_ASYNC_PF_ENABLED))
+   return true;
+   else
+   return !kvm_event_needs_reinjection(vcpu) 
+   kvm_x86_ops-interrupt_allowed(vcpu);
 }
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index a78a5e5..9c2cc6a 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -204,34 +204,39 @@ TRACE_EVENT(
 
 TRACE_EVENT(
kvm_async_pf_not_present,
-   TP_PROTO(u64 gva),
-   TP_ARGS(gva),
+   TP_PROTO(u64 token, u64 gva),
+   TP_ARGS(token, gva),
 
TP_STRUCT__entry(
+   __field(__u64, token)
__field(__u64, gva)
),

[PATCH v7 01/12] Add get_user_pages() variant that fails if major fault is required.

2010-10-14 Thread Gleb Natapov

This patch add get_user_pages() variant that only succeeds if getting
a reference to a page doesn't require major fault.

Reviewed-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 fs/ncpfs/mmap.c|2 ++
 include/linux/mm.h |5 +
 mm/filemap.c   |3 +++
 mm/memory.c|   31 ---
 mm/shmem.c |8 +++-
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/fs/ncpfs/mmap.c b/fs/ncpfs/mmap.c
index 56f5b3a..b9c4f36 100644
--- a/fs/ncpfs/mmap.c
+++ b/fs/ncpfs/mmap.c
@@ -39,6 +39,8 @@ static int ncp_file_mmap_fault(struct vm_area_struct *area,
int bufsize;
int pos; /* XXX: loff_t ? */
 
+   if (vmf-flags  FAULT_FLAG_MINOR)
+   return VM_FAULT_MAJOR | VM_FAULT_ERROR;
/*
 * ncpfs has nothing against high pages as long
 * as recvmsg and memset works on it
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 74949fb..da32900 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -144,6 +144,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_WRITE   0x01/* Fault was a write access */
 #define FAULT_FLAG_NONLINEAR   0x02/* Fault was via a nonlinear mapping */
 #define FAULT_FLAG_MKWRITE 0x04/* Fault was mkwrite of existing pte */
+#define FAULT_FLAG_MINOR   0x08/* Do only minor fault */
 
 /*
  * This interface is used by x86 PAT code to identify a pfn mapping that is
@@ -848,6 +849,9 @@ extern int access_process_vm(struct task_struct *tsk, 
unsigned long addr, void *
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int nr_pages, int write, int force,
struct page **pages, struct vm_area_struct **vmas);
+int get_user_pages_noio(struct task_struct *tsk, struct mm_struct *mm,
+   unsigned long start, int nr_pages, int write, int force,
+   struct page **pages, struct vm_area_struct **vmas);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
 struct page *get_dump_page(unsigned long addr);
@@ -1394,6 +1398,7 @@ struct page *follow_page(struct vm_area_struct *, 
unsigned long address,
 #define FOLL_GET   0x04/* do get_page on page */
 #define FOLL_DUMP  0x08/* give error on hole if it would be zero */
 #define FOLL_FORCE 0x10/* get_user_pages read/write w/o permission */
+#define FOLL_MINOR 0x20/* do only minor page faults */
 
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
void *data);
diff --git a/mm/filemap.c b/mm/filemap.c
index 3d4df44..ef28b6d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1548,6 +1548,9 @@ int filemap_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf)
goto no_cached_page;
}
} else {
+   if (vmf-flags  FAULT_FLAG_MINOR)
+   return VM_FAULT_MAJOR | VM_FAULT_ERROR;
+
/* No page in the page cache at all */
do_sync_mmap_readahead(vma, ra, file, offset);
count_vm_event(PGMAJFAULT);
diff --git a/mm/memory.c b/mm/memory.c
index 0e18b4d..b221458 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1441,10 +1441,13 @@ int __get_user_pages(struct task_struct *tsk, struct 
mm_struct *mm,
cond_resched();
while (!(page = follow_page(vma, start, foll_flags))) {
int ret;
+   unsigned int fault_fl =
+   ((foll_flags  FOLL_WRITE) ?
+   FAULT_FLAG_WRITE : 0) |
+   ((foll_flags  FOLL_MINOR) ?
+   FAULT_FLAG_MINOR : 0);
 
-   ret = handle_mm_fault(mm, vma, start,
-   (foll_flags  FOLL_WRITE) ?
-   FAULT_FLAG_WRITE : 0);
+   ret = handle_mm_fault(mm, vma, start, fault_fl);
 
if (ret  VM_FAULT_ERROR) {
if (ret  VM_FAULT_OOM)
@@ -1452,6 +1455,8 @@ int __get_user_pages(struct task_struct *tsk, struct 
mm_struct *mm,
if (ret 
(VM_FAULT_HWPOISON|VM_FAULT_SIGBUS))
return i ? i : -EFAULT;
+   else if (ret  VM_FAULT_MAJOR)
+   return i ? i : -EFAULT;
BUG();
}
if (ret  VM_FAULT_MAJOR)
@@ -1562,6 +1567,23 @@ int get_user_pages(struct task_struct *tsk, struct 
mm_struct

[PATCH v7 11/12] Let host know whether the guest can handle async PF in non-userspace context.

2010-10-14 Thread Gleb Natapov

If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 Documentation/kvm/msr.txt   |6 +++---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/include/asm/kvm_para.h |1 +
 arch/x86/kernel/kvm.c   |3 +++
 arch/x86/kvm/x86.c  |5 +++--
 5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/Documentation/kvm/msr.txt b/Documentation/kvm/msr.txt
index 27c11a6..d079aed 100644
--- a/Documentation/kvm/msr.txt
+++ b/Documentation/kvm/msr.txt
@@ -154,9 +154,10 @@ MSR_KVM_SYSTEM_TIME: 0x12
 MSR_KVM_ASYNC_PF_EN: 0x4b564d02
data: Bits 63-6 hold 64-byte aligned physical address of a
64 byte memory area which must be in guest RAM and must be
-   zeroed. Bits 5-1 are reserved and should be zero. Bit 0 is 1
+   zeroed. Bits 5-2 are reserved and should be zero. Bit 0 is 1
when asynchronous page faults are enabled on the vcpu 0 when
-   disabled.
+   disabled. Bit 2 is 1 if asynchronous page faults can be injected
+   when vcpu is in cpl == 0.
 
First 4 byte of 64 byte memory location will be written to by
the hypervisor at the time of asynchronous page fault (APF)
@@ -184,4 +185,3 @@ MSR_KVM_ASYNC_PF_EN: 0x4b564d02
 
Currently type 2 APF will be always delivered on the same vcpu as
type 1 was, but guest should not rely on that.
-
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f1868ed..d2fa951 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -422,6 +422,7 @@ struct kvm_vcpu_arch {
struct gfn_to_hva_cache data;
u64 msr_val;
u32 id;
+   bool send_user_only;
} apf;
 };
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index fbfd367..d3a1a48 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -38,6 +38,7 @@
 #define KVM_MAX_MMU_OP_BATCH   32
 
 #define KVM_ASYNC_PF_ENABLED   (1  0)
+#define KVM_ASYNC_PF_SEND_ALWAYS   (1  1)
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE1
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 47ea93e..91b3d65 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -449,6 +449,9 @@ void __cpuinit kvm_guest_cpu_init(void)
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF)  kvmapf) {
u64 pa = __pa(__get_cpu_var(apf_reason));
 
+#ifdef CONFIG_PREEMPT
+   pa |= KVM_ASYNC_PF_SEND_ALWAYS;
+#endif
wrmsrl(MSR_KVM_ASYNC_PF_EN, pa | KVM_ASYNC_PF_ENABLED);
__get_cpu_var(apf_reason).enabled = 1;
printk(KERN_INFOKVM setup async PF for cpu %d\n,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8e2fc59..1e442df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1435,8 +1435,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, 
u64 data)
 {
gpa_t gpa = data  ~0x3f;
 
-   /* Bits 1:5 are resrved, Should be zero */
-   if (data  0x3e)
+   /* Bits 2:5 are resrved, Should be zero */
+   if (data  0x3c)
return 1;
 
vcpu-arch.apf.msr_val = data;
@@ -1450,6 +1450,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, 
u64 data)
if (kvm_gfn_to_hva_cache_init(vcpu-kvm, vcpu-arch.apf.data, gpa))
return 1;
 
+   vcpu-arch.apf.send_user_only = !(data  KVM_ASYNC_PF_SEND_ALWAYS);
kvm_async_pf_wakeup_all(vcpu);
return 0;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-14 Thread Gleb Natapov

If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.

Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |   18 
 arch/x86/kvm/Kconfig|1 +
 arch/x86/kvm/Makefile   |1 +
 arch/x86/kvm/mmu.c  |   52 +++-
 arch/x86/kvm/paging_tmpl.h  |4 +-
 arch/x86/kvm/x86.c  |  112 ++-
 include/linux/kvm_host.h|   31 +++
 include/trace/events/kvm.h  |   90 ++
 virt/kvm/Kconfig|3 +
 virt/kvm/async_pf.c |  190 +++
 virt/kvm/async_pf.h |   36 
 virt/kvm/kvm_main.c |   57 +---
 12 files changed, 578 insertions(+), 17 deletions(-)
 create mode 100644 virt/kvm/async_pf.c
 create mode 100644 virt/kvm/async_pf.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e209078..043e29e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -83,11 +83,14 @@
 #define KVM_NR_FIXED_MTRR_REGION 88
 #define KVM_NR_VAR_MTRR 8
 
+#define ASYNC_PF_PER_VCPU 64
+
 extern spinlock_t kvm_lock;
 extern struct list_head vm_list;
 
 struct kvm_vcpu;
 struct kvm;
+struct kvm_async_pf;
 
 enum kvm_reg {
VCPU_REGS_RAX = 0,
@@ -412,6 +415,11 @@ struct kvm_vcpu_arch {
u64 hv_vapic;
 
cpumask_var_t wbinvd_dirty_mask;
+
+   struct {
+   bool halted;
+   gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
+   } apf;
 };
 
 struct kvm_arch {
@@ -585,6 +593,10 @@ struct kvm_x86_ops {
const struct trace_print_flags *exit_reasons_str;
 };
 
+struct kvm_arch_async_pf {
+   gfn_t gfn;
+};
+
 extern struct kvm_x86_ops *kvm_x86_ops;
 
 int kvm_mmu_module_init(void);
@@ -823,4 +835,10 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ddc131f..50f6364 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -28,6 +28,7 @@ config KVM
select HAVE_KVM_IRQCHIP
select HAVE_KVM_EVENTFD
select KVM_APIC_ARCHITECTURE
+   select KVM_ASYNC_PF
select USER_RETURN_NOTIFIER
select KVM_MMIO
---help---
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 31a7035..c53bf19 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -9,6 +9,7 @@ kvm-y   += $(addprefix ../../../virt/kvm/, 
kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
assigned-dev.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
+kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
 kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
   i8254.o timer.o
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 908ea54..f01e89a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -18,9 +18,11 @@
  *
  */
 
+#include irq.h
 #include mmu.h
 #include x86.h
 #include kvm_cache_regs.h
+#include x86.h
 
 #include linux/kvm_host.h
 #include linux/types.h
@@ -2585,6 +2587,50 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 error_code  PFERR_WRITE_MASK, gfn);
 }
 
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
+{
+   struct kvm_arch_async_pf arch;
+   arch.gfn = gfn;
+
+   return kvm_setup_async_pf(vcpu, gva, gfn, arch);
+}
+
+static bool can_do_async_pf(struct kvm_vcpu *vcpu)
+{
+   if (unlikely(!irqchip_in_kernel(vcpu-kvm) ||
+kvm_event_needs_reinjection(vcpu)))
+   return false;
+
+   return kvm_x86_ops-interrupt_allowed(vcpu);
+}
+
+static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
+pfn_t *pfn)
+{
+   bool async;
+
+   *pfn = gfn_to_pfn_async(vcpu-kvm, gfn, async);
+
+   if (!async)
+   return false; /* *pfn has correct page already */
+
+   put_page(pfn_to_page(*pfn));
+
+   if (can_do_async_pf(vcpu)) {
+   trace_kvm_try_async_get_page(async, *pfn);
+   if (kvm_find_async_pf_gfn(vcpu,

[PATCH v7 05/12] Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.

2010-10-14 Thread Gleb Natapov

Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
into generic code.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_para.h |1 +
 arch/x86/kernel/kvm.c   |   11 +++
 arch/x86/kernel/kvmclock.c  |   13 +
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 7b562b6..e3faaaf 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,7 @@ struct kvm_mmu_op_release_pt {
 #include asm/processor.h
 
 extern void kvmclock_init(void);
+extern int kvm_register_clock(char *txt);
 
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 63b0ec8..e6db179 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -231,10 +231,21 @@ static void __init paravirt_ops_setup(void)
 #endif
 }
 
+#ifdef CONFIG_SMP
+static void __init kvm_smp_prepare_boot_cpu(void)
+{
+   WARN_ON(kvm_register_clock(primary cpu clock));
+   native_smp_prepare_boot_cpu();
+}
+#endif
+
 void __init kvm_guest_init(void)
 {
if (!kvm_para_available())
return;
 
paravirt_ops_setup();
+#ifdef CONFIG_SMP
+   smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+#endif
 }
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ca43ce3..f98d3ea 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -125,7 +125,7 @@ static struct clocksource kvm_clock = {
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
-static int kvm_register_clock(char *txt)
+int kvm_register_clock(char *txt)
 {
int cpu = smp_processor_id();
int low, high, ret;
@@ -152,14 +152,6 @@ static void __cpuinit kvm_setup_secondary_clock(void)
 }
 #endif
 
-#ifdef CONFIG_SMP
-static void __init kvm_smp_prepare_boot_cpu(void)
-{
-   WARN_ON(kvm_register_clock(primary cpu clock));
-   native_smp_prepare_boot_cpu();
-}
-#endif
-
 /*
  * After the clock is registered, the host will keep writing to the
  * registered memory location. If the guest happens to shutdown, this memory
@@ -206,9 +198,6 @@ void __init kvmclock_init(void)
x86_cpuinit.setup_percpu_clockev =
kvm_setup_secondary_clock;
 #endif
-#ifdef CONFIG_SMP
-   smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
-#endif
machine_ops.shutdown  = kvm_shutdown;
 #ifdef CONFIG_KEXEC
machine_ops.crash_shutdown  = kvm_crash_shutdown;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 00/12] KVM: Add host swap event notifications for PV guest

2010-10-14 Thread Gleb Natapov

KVM virtualizes guest memory by means of shadow pages or HW assistance
like NPT/EPT. Not all memory used by a guest is mapped into the guest
address space or even present in a host memory at any given time.
When vcpu tries to access memory page that is not mapped into the guest
address space KVM is notified about it. KVM maps the page into the guest
address space and resumes vcpu execution. If the page is swapped out from
the host memory vcpu execution is suspended till the page is swapped
into the memory again. This is inefficient since vcpu can do other work
(run other task or serve interrupts) while page gets swapped in.

The patch series tries to mitigate this problem by introducing two
mechanisms. The first one is used with non-PV guest and it works like
this: when vcpu tries to access swapped out page it is halted and
requested page is swapped in by another thread. That way vcpu can still
process interrupts while io is happening in parallel and, with any luck,
interrupt will cause the guest to schedule another task on the vcpu, so
it will have work to do instead of waiting for the page to be swapped in.

The second mechanism introduces PV notification about swapped page state to
a guest (asynchronous page fault). Instead of halting vcpu upon access to
swapped out page and hoping that some interrupt will cause reschedule we
immediately inject asynchronous page fault to the vcpu.  PV aware guest
knows that upon receiving such exception it should schedule another task
to run on the vcpu. Current task is put to sleep until another kind of
asynchronous page fault is received that notifies the guest that page
is now in the host memory, so task that waits for it can run again.

To measure performance benefits I use a simple benchmark program (below)
that starts number of threads. Some of them do work (increment counter),
others access huge array in random location trying to generate host page
faults. The size of the array is smaller then guest memory bug bigger
then host memory so we are guarantied that host will swap out part of
the array.

I ran the benchmark on three setups: with current kvm.git (master),
with my patch series + non-pv guest (nonpv) and with my patch series +
pv guest (pv).

Each guest had 4 cpus and 2G memory and was launched inside 512M memory
container. The command line was ./bm -f 4 -w 4 -t 60 (run 4 faulting
threads and 4 working threads for a minute).

Below is the total amount of work each guest managed to do
(average of 10 runs):
 total workstd error
master: 122789420615 (3818565029)
nonpv:  138455939001 (773774299)
pv: 234351846135 (10461117116)

Changes:
 v1-v2
   Use MSR instead of hypercall.
   Move most of the code into arch independent place.
   halt inside a guest instead of doing wait for page hypercall if
preemption is disabled.
 v2-v3
   Use MSR from range 0x4b564dxx.
   Add slot version tracking.
   Support migration by restarting all guest processes after migration.
   Drop patch that tract preemptability for non-preemptable kernels
due to performance concerns. Send async PF to non-preemptable
guests only when vcpu is executing userspace code.
 v3-v4
  Provide alternative page fault handler in PV guest instead of adding hook to
   standard page fault handler and patch it out on non-PV guests.
  Allow only limited number of outstanding async page fault per vcpu.
  Unify  gfn_to_pfn and gfn_to_pfn_async code.
  Cancel outstanding slow work on reset.
 v4-v5
  Move async pv cpu initialization into cpu hotplug notifier.
  Use GFP_NOWAIT instead of GFP_ATOMIC for allocation that shouldn't sleep
  Process KVM_REQ_MMU_SYNC even in page_fault_other_cr3() before changing
   cr3 back
 v5-v6
  To many. Will list only major changes here.
  Replace slow work with work queues.
  Halt vcpu for non-pv guests.
  Handle async PF in nested SVM mode.
  Do not prefault swapped in page for non tdp case.
 v6-v7
  Fix GUP fail in work thread problem
  Do prefault only if mmu is in direct map mode
  Use cpu-request to ask for vcpu halt (drop optimization that tried to
   skip non-present apf injection if page is swapped in before next vmentry)
  Keep track of synthetic halt in separate state to prevent it from leaking
   during migration.
  Fix memslot tracking problems.
  More documentation.
  Other small comments are addressed

Gleb Natapov (12):
  Add get_user_pages() variant that fails if major fault is required.
  Halt vcpu if page it tries to access is swapped out.
  Retry fault before vmentry
  Add memory slot versioning and use it to provide fast guest write interface
  Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
  Add PV MSR to enable asynchronous page faults delivery.
  Add async PF initialization to PV guest.
  Handle async PF in a guest.
  Inject asynchronous page fault into a PV guest if page is swapped out.
  Handle async PF in non preemptable context
  Let host know whether the guest can handle async PF in non-userspace context.
  Send

[PATCH v7 04/12] Add memory slot versioning and use it to provide fast guest write interface

2010-10-14 Thread Gleb Natapov

Keep track of memslots changes by keeping generation number in memslots
structure. Provide kvm_write_guest_cached() function that skips
gfn_to_hva() translation if memslots was not changed since previous
invocation.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 include/linux/kvm_host.h  |7 
 include/linux/kvm_types.h |7 
 virt/kvm/kvm_main.c   |   75 +---
 3 files changed, 77 insertions(+), 12 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9a9b017..dda88f2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -199,6 +199,7 @@ struct kvm_irq_routing_table {};
 
 struct kvm_memslots {
int nmemslots;
+   u64 generation;
struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS +
KVM_PRIVATE_MEM_SLOTS];
 };
@@ -352,12 +353,18 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, 
const void *data,
 int offset, int len);
 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
unsigned long len);
+int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+  void *data, unsigned long len);
+int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+ gpa_t gpa);
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
+void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
+gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 7ac0d4e..fa7cc72 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -67,4 +67,11 @@ struct kvm_lapic_irq {
u32 dest_id;
 };
 
+struct gfn_to_hva_cache {
+   u64 generation;
+   gpa_t gpa;
+   unsigned long hva;
+   struct kvm_memory_slot *memslot;
+};
+
 #endif /* __KVM_TYPES_H__ */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 238079e..5d57ec9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -687,6 +687,7 @@ skip_lpage:
memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
if (mem-slot = slots-nmemslots)
slots-nmemslots = mem-slot + 1;
+   slots-generation++;
slots-memslots[mem-slot].flags |= KVM_MEMSLOT_INVALID;
 
old_memslots = kvm-memslots;
@@ -723,6 +724,7 @@ skip_lpage:
memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
if (mem-slot = slots-nmemslots)
slots-nmemslots = mem-slot + 1;
+   slots-generation++;
 
/* actual memory is freed via old in kvm_free_physmem_slot below */
if (!npages) {
@@ -853,10 +855,10 @@ int kvm_is_error_hva(unsigned long addr)
 }
 EXPORT_SYMBOL_GPL(kvm_is_error_hva);
 
-struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
+static struct kvm_memory_slot *__gfn_to_memslot(struct kvm_memslots *slots,
+   gfn_t gfn)
 {
int i;
-   struct kvm_memslots *slots = kvm_memslots(kvm);
 
for (i = 0; i  slots-nmemslots; ++i) {
struct kvm_memory_slot *memslot = slots-memslots[i];
@@ -867,6 +869,11 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, 
gfn_t gfn)
}
return NULL;
 }
+
+struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
+{
+   return __gfn_to_memslot(kvm_memslots(kvm), gfn);
+}
 EXPORT_SYMBOL_GPL(gfn_to_memslot);
 
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
@@ -929,12 +936,9 @@ int memslot_id(struct kvm *kvm, gfn_t gfn)
return memslot - slots-memslots;
 }
 
-static unsigned long gfn_to_hva_many(struct kvm *kvm, gfn_t gfn,
+static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
 gfn_t *nr_pages)
 {
-   struct kvm_memory_slot *slot;
-
-   slot = gfn_to_memslot(kvm, gfn);
if (!slot || slot-flags  KVM_MEMSLOT_INVALID)
return bad_hva();
 
@@ -946,7 +950,7 @@ static unsigned long gfn_to_hva_many(struct kvm *kvm, gfn_t 
gfn,
 
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 {
-   return gfn_to_hva_many(kvm, gfn, NULL);
+   return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_hva);
 
@@ -1063,7 +1067,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, 
struct page **pages,
unsigned long addr;

[PATCH v7 06/12] Add PV MSR to enable asynchronous page faults delivery.

2010-10-14 Thread Gleb Natapov

Guest enables async PF vcpu functionality using this MSR.

Reviewed-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 Documentation/kvm/cpuid.txt |3 +++
 Documentation/kvm/msr.txt   |   36 +++-
 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/include/asm/kvm_para.h |4 
 arch/x86/kvm/x86.c  |   38 --
 include/linux/kvm.h |1 +
 include/linux/kvm_host.h|1 +
 virt/kvm/async_pf.c |   20 
 8 files changed, 102 insertions(+), 3 deletions(-)

diff --git a/Documentation/kvm/cpuid.txt b/Documentation/kvm/cpuid.txt
index 14a12ea..8820685 100644
--- a/Documentation/kvm/cpuid.txt
+++ b/Documentation/kvm/cpuid.txt
@@ -36,6 +36,9 @@ KVM_FEATURE_MMU_OP || 2 || deprecated.
 KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock available at msrs
||   || 0x4b564d00 and 0x4b564d01
 --
+KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
+   ||   || writing to msr 0x4b564d02
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/Documentation/kvm/msr.txt b/Documentation/kvm/msr.txt
index 8ddcfe8..27c11a6 100644
--- a/Documentation/kvm/msr.txt
+++ b/Documentation/kvm/msr.txt
@@ -3,7 +3,6 @@ Glauber Costa glom...@redhat.com, Red Hat Inc, 2010
 =
 
 KVM makes use of some custom MSRs to service some requests.
-At present, this facility is only used by kvmclock.
 
 Custom MSRs have a range reserved for them, that goes from
 0x4b564d00 to 0x4b564dff. There are MSRs outside this area,
@@ -151,3 +150,38 @@ MSR_KVM_SYSTEM_TIME: 0x12
return PRESENT;
} else
return NON_PRESENT;
+
+MSR_KVM_ASYNC_PF_EN: 0x4b564d02
+   data: Bits 63-6 hold 64-byte aligned physical address of a
+   64 byte memory area which must be in guest RAM and must be
+   zeroed. Bits 5-1 are reserved and should be zero. Bit 0 is 1
+   when asynchronous page faults are enabled on the vcpu 0 when
+   disabled.
+
+   First 4 byte of 64 byte memory location will be written to by
+   the hypervisor at the time of asynchronous page fault (APF)
+   injection to indicate type of asynchronous page fault. Value
+   of 1 means that the page referred to by the page fault is not
+   present. Value 2 means that the page is now available. Disabling
+   interrupt inhibits APFs. Guest must not enable interrupt
+   before the reason is read, or it may be overwritten by another
+   APF. Since APF uses the same exception vector as regular page
+   fault guest must reset the reason to 0 before it does
+   something that can generate normal page fault.  If during page
+   fault APF reason is 0 it means that this is regular page
+   fault.
+
+   During delivery of type 1 APF cr2 contains a token that will
+   be used to notify a guest when missing page becomes
+   available. When page becomes available type 2 APF is sent with
+   cr2 set to the token associated with the page. There is special
+   kind of token 0x which tells vcpu that it should wake
+   up all processes waiting for APFs and no individual type 2 APFs
+   will be sent.
+
+   If APF is disabled while there are outstanding APFs, they will
+   not be delivered.
+
+   Currently type 2 APF will be always delivered on the same vcpu as
+   type 1 was, but guest should not rely on that.
+
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 96aca44..26b2064 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -419,6 +419,8 @@ struct kvm_vcpu_arch {
struct {
bool halted;
gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
+   struct gfn_to_hva_cache data;
+   u64 msr_val;
} apf;
 };
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index e3faaaf..8662ae0 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -20,6 +20,7 @@
  * are available. The use of 0x11 and 0x12 is deprecated
  */
 #define KVM_FEATURE_CLOCKSOURCE23
+#define KVM_FEATURE_ASYNC_PF   4
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -32,9 +33,12 @@
 /* Custom MSRs falls in the range

[PATCH v7 07/12] Add async PF initialization to PV guest.

2010-10-14 Thread Gleb Natapov

Enable async PF in a guest if async PF capability is discovered.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 Documentation/kernel-parameters.txt |3 +
 arch/x86/include/asm/kvm_para.h |6 ++
 arch/x86/kernel/kvm.c   |   92 +++
 3 files changed, 101 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 8dc2548..0bd2203 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1699,6 +1699,9 @@ and is between 256 and 4096 characters. It is defined in 
the file
 
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
 
+   no-kvmapf   [X86,KVM] Disable paravirtualized asynchronous page
+   fault handling.
+
nolapic [X86-32,APIC] Do not enable or use the local APIC.
 
nolapic_timer   [X86-32,APIC] Do not use the local APIC timer.
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 8662ae0..2315398 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -65,6 +65,12 @@ struct kvm_mmu_op_release_pt {
__u64 pt_phys;
 };
 
+struct kvm_vcpu_pv_apf_data {
+   __u32 reason;
+   __u8 pad[60];
+   __u32 enabled;
+};
+
 #ifdef __KERNEL__
 #include asm/processor.h
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e6db179..032d03b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -27,16 +27,30 @@
 #include linux/mm.h
 #include linux/highmem.h
 #include linux/hardirq.h
+#include linux/notifier.h
+#include linux/reboot.h
 #include asm/timer.h
+#include asm/cpu.h
 
 #define MMU_QUEUE_SIZE 1024
 
+static int kvmapf = 1;
+
+static int parse_no_kvmapf(char *arg)
+{
+kvmapf = 0;
+return 0;
+}
+
+early_param(no-kvmapf, parse_no_kvmapf);
+
 struct kvm_para_state {
u8 mmu_queue[MMU_QUEUE_SIZE];
int mmu_queue_len;
 };
 
 static DEFINE_PER_CPU(struct kvm_para_state, para_state);
+static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
 
 static struct kvm_para_state *kvm_para_state(void)
 {
@@ -231,12 +245,86 @@ static void __init paravirt_ops_setup(void)
 #endif
 }
 
+void __cpuinit kvm_guest_cpu_init(void)
+{
+   if (!kvm_para_available())
+   return;
+
+   if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF)  kvmapf) {
+   u64 pa = __pa(__get_cpu_var(apf_reason));
+
+   wrmsrl(MSR_KVM_ASYNC_PF_EN, pa | KVM_ASYNC_PF_ENABLED);
+   __get_cpu_var(apf_reason).enabled = 1;
+   printk(KERN_INFOKVM setup async PF for cpu %d\n,
+  smp_processor_id());
+   }
+}
+
+static void kvm_pv_disable_apf(void *unused)
+{
+   if (!__get_cpu_var(apf_reason).enabled)
+   return;
+
+   wrmsrl(MSR_KVM_ASYNC_PF_EN, 0);
+   __get_cpu_var(apf_reason).enabled = 0;
+
+   printk(KERN_INFOUnregister pv shared memory for cpu %d\n,
+  smp_processor_id());
+}
+
+static int kvm_pv_reboot_notify(struct notifier_block *nb,
+   unsigned long code, void *unused)
+{
+   if (code == SYS_RESTART)
+   on_each_cpu(kvm_pv_disable_apf, NULL, 1);
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block kvm_pv_reboot_nb = {
+   .notifier_call = kvm_pv_reboot_notify,
+};
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
WARN_ON(kvm_register_clock(primary cpu clock));
+   kvm_guest_cpu_init();
native_smp_prepare_boot_cpu();
 }
+
+static void kvm_guest_cpu_online(void *dummy)
+{
+   kvm_guest_cpu_init();
+}
+
+static void kvm_guest_cpu_offline(void *dummy)
+{
+   kvm_pv_disable_apf(NULL);
+}
+
+static int __cpuinit kvm_cpu_notify(struct notifier_block *self,
+   unsigned long action, void *hcpu)
+{
+   int cpu = (unsigned long)hcpu;
+   switch (action) {
+   case CPU_ONLINE:
+   case CPU_DOWN_FAILED:
+   case CPU_ONLINE_FROZEN:
+   smp_call_function_single(cpu, kvm_guest_cpu_online, NULL, 0);
+   break;
+   case CPU_DOWN_PREPARE:
+   case CPU_DOWN_PREPARE_FROZEN:
+   smp_call_function_single(cpu, kvm_guest_cpu_offline, NULL, 1);
+   break;
+   default:
+   break;
+   }
+   return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata kvm_cpu_notifier = {
+.notifier_call  = kvm_cpu_notify,
+};
 #endif
 
 void __init kvm_guest_init(void)
@@ -245,7 +333,11 @@ void __init kvm_guest_init(void)
return;
 
paravirt_ops_setup();
+   register_reboot_notifier(kvm_pv_reboot_nb);
 #ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+   register_cpu_notifier(kvm_cpu_notifier);
+#else
+   kvm_guest_cpu_init();
 #endif

[AUTOTEST] [PATCH 1/2] KVM : ping6 test

2010-10-14 Thread pradeep

This patch is for Ping6 testing

* ping6 with various message sizes guest to/from local/remote host
  using link-local addresses 
  By default IPv6 seems to be disabled  on virbr0. Enable it by
  doing echo 0  /proc/sys/net/ipv6/conf/virbr0/disable_ipv6

Please find the below attached patch.

Thanks
Pradeep




ipv6_1
Description: Binary data

[PATCH v7 03/12] Retry fault before vmentry

2010-10-14 Thread Gleb Natapov

When page is swapped in it is mapped into guest memory only after guest
tries to access it again and generate another fault. To save this fault
we can map it immediately since we know that guest is going to access
the page. Do it only when tdp is enabled for now. Shadow paging case is
more complicated. CR[034] and EFER registers should be switched before
doing mapping and then switched back.

Acked-by: Rik van Riel r...@redhat.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |4 +++-
 arch/x86/kvm/mmu.c  |   16 
 arch/x86/kvm/paging_tmpl.h  |6 +++---
 arch/x86/kvm/x86.c  |7 +++
 virt/kvm/async_pf.c |2 ++
 5 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 043e29e..96aca44 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -241,7 +241,7 @@ struct kvm_mmu {
void (*new_cr3)(struct kvm_vcpu *vcpu);
void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
-   int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
+   int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, bool 
no_apf);
void (*inject_page_fault)(struct kvm_vcpu *vcpu);
void (*free)(struct kvm_vcpu *vcpu);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
@@ -839,6 +839,8 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+  struct kvm_async_pf *work);
 extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f01e89a..11d152b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2568,7 +2568,7 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu 
*vcpu, gva_t vaddr,
 }
 
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
-   u32 error_code)
+   u32 error_code, bool no_apf)
 {
gfn_t gfn;
int r;
@@ -2604,8 +2604,8 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu)
return kvm_x86_ops-interrupt_allowed(vcpu);
 }
 
-static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
-pfn_t *pfn)
+static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn,
+gva_t gva, pfn_t *pfn)
 {
bool async;
 
@@ -2616,7 +2616,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t 
gfn, gva_t gva,
 
put_page(pfn_to_page(*pfn));
 
-   if (can_do_async_pf(vcpu)) {
+   if (!no_apf  can_do_async_pf(vcpu)) {
trace_kvm_try_async_get_page(async, *pfn);
if (kvm_find_async_pf_gfn(vcpu, gfn)) {
trace_kvm_async_pf_doublefault(gva, gfn);
@@ -2631,8 +2631,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t 
gfn, gva_t gva,
return false;
 }
 
-static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
-   u32 error_code)
+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
+ bool no_apf)
 {
pfn_t pfn;
int r;
@@ -2654,7 +2654,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa,
mmu_seq = vcpu-kvm-mmu_notifier_seq;
smp_rmb();
 
-   if (try_async_pf(vcpu, gfn, gpa, pfn))
+   if (try_async_pf(vcpu, no_apf, gfn, gpa, pfn))
return 0;
 
/* mmio */
@@ -3317,7 +3317,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u32 error_code)
int r;
enum emulation_result er;
 
-   r = vcpu-arch.mmu.page_fault(vcpu, cr2, error_code);
+   r = vcpu-arch.mmu.page_fault(vcpu, cr2, error_code, false);
if (r  0)
goto out;
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index c45376d..d6b281e 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -527,8 +527,8 @@ out_gpte_changed:
  *  Returns: 1 if we need to emulate the instruction, 0 otherwise, or
  *   a negative value on error.
  */
-static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
-  u32 error_code)
+static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
+bool no_apf)
 {
int write_fault = error_code  PFERR_WRITE_MASK;
int user_fault = error_code  PFERR_USER_MASK;
@@ -569,7 +569,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr,
mmu_seq =

[AUTOTEST] [PATCH 1/2] KVM : ping6 test

2010-10-14 Thread pradeep

Changes for tests_base.cfg to include ping6 test

Please find below attached patch.

Thanks
Pradeep

ipv6_2
Description: Binary data

Re: [AUTOTEST] [PATCH 1/2] KVM : ping6 test

2010-10-14 Thread Amos Kong

On Thu, Oct 14, 2010 at 02:56:59PM +0530, pradeep wrote:
 This patch is for Ping6 testing
 
 * ping6 with various message sizes guest to/from local/remote host
   using link-local addresses 
   By default IPv6 seems to be disabled  on virbr0. Enable it by
   doing echo 0  /proc/sys/net/ipv6/conf/virbr0/disable_ipv6
 
 Please find the below attached patch

We also need update related code in kvm_test_utils.py, and consider the 
difference of
'ping' and 'ping6'.

 Signed-off-by: Pradeep K Surisetty psuri...@linux.vnet.ibm.com
 ---
 --- autotest/client/tests/kvm/tests/ping.py   2010-10-14 14:20:52.523791118 
 +0530
 +++ autotest_new/client/tests/kvm/tests/ping.py   2010-10-14 
 14:46:57.711797139 +0530
 @@ -1,5 +1,6 @@
 -import logging
 +import logging, time
  from autotest_lib.client.common_lib import error
 +from autotest_lib.client.bin import utils
  import kvm_test_utils
  
  
 @@ -27,10 +28,18 @@ def run_ping(test, params, env):
  nics = params.get(nics).split()
  strict_check = params.get(strict_check, no) == yes
  
 +address_type = params.get(address_type)
 +#By default IPv6 seems to be disabled on virbr0. 
 +ipv6_cmd = echo %s  /proc/sys/net/ipv6/conf/virbr0/disable_ipv6

We may use other bridge, so 'virbr0', need replace this hardcode name.
We can reference to  'autotest-upstream/client/tests/kvm/scripts/qemu-ifup'
   switch=$(/usr/sbin/brctl show | awk 'NR==2 { print $1 }')


 +
  packet_size = [0, 1, 4, 48, 512, 1440, 1500, 1505, 4054, 4055, 4096, 
 4192,
 8878, 9000, 32767, 65507]
  
  try:
 +if address_type == ipv6:
 +utils.run(ipv6_cmd % 0 )
 +time.sleep(5)
 +
  for i, nic in enumerate(nics):
  ip = vm.get_address(i)
  if not ip:
 @@ -68,5 +77,9 @@ def run_ping(test, params, env):
  if status != 0:
  raise error.TestFail(Ping returns non-zero value %s %
   output)
 +if address_type == ipv6:
 +utils.run(ipv6_cmd % 1 )
 +time.sleep(5)
 +
  finally:
  session.close()
 ---
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] port qemu-kvm's MCE support (v3)

2010-10-14 Thread Avi Kivity


 On 10/11/2010 08:31 PM, Marcelo Tosatti wrote:

Port qemu-kvm's KVM MCE (Machine Check Exception) handling to qemu. It
allows qemu to propagate MCEs to the guest.

v2:
- rename do_qemu_ram_addr_from_host.
- fix kvm_on_sigbus/kvm_on_sigbus_vcpu naming.
- fix bank register restoration (Dean Nelson).

v3:
- condition MCE generation on MCE_SEG_P bit (Huang Ying).



I only see patches 1 and 4 from v2, and this cover letter from v3.  
Please repost.


Also, if the patchset ends up with qemu-kvm master being different from 
uq/master in this area, please post the corresponding qemu-kvm master 
patches.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] trace exit to userspace event

2010-10-14 Thread Avi Kivity


 On 10/10/2010 05:46 PM, Gleb Natapov wrote:


  We should log both errno and exit_reason.  If we want to be clever,
  we can display strerror(errno) if it's nonzero, and exit_reason
  otherwise (easy to do in a trace-cmd plugin).

For starters we should remove KVM_EXIT_INTR exit reason. Looking into
qemu-kvm history it was never used and there is at least one code path
that returns -EINTR and does not set KVM_EXIT_INTR, so exit_reason field
contains stale info on exit.



The two issues are unrelated.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] trace exit to userspace event

2010-10-14 Thread Gleb Natapov

On Thu, Oct 14, 2010 at 12:27:15PM +0200, Avi Kivity wrote:
  On 10/10/2010 05:46 PM, Gleb Natapov wrote:
 
   We should log both errno and exit_reason.  If we want to be clever,
   we can display strerror(errno) if it's nonzero, and exit_reason
   otherwise (easy to do in a trace-cmd plugin).
 
 For starters we should remove KVM_EXIT_INTR exit reason. Looking into
 qemu-kvm history it was never used and there is at least one code path
 that returns -EINTR and does not set KVM_EXIT_INTR, so exit_reason field
 contains stale info on exit.
 
 
 The two issues are unrelated.
 
So what do you propose? I see no issue with my original patch.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [AUTOTEST] [PATCH 1/2] KVM : ping6 test

2010-10-14 Thread pradeep

On Thu, 14 Oct 2010 18:05:04 +0800
Amos Kong ak...@redhat.com wrote:

 On Thu, Oct 14, 2010 at 02:56:59PM +0530, pradeep wrote:
  This patch is for Ping6 testing
  
  * ping6 with various message sizes guest to/from local/remote
  host using link-local addresses 
By default IPv6 seems to be disabled  on virbr0. Enable it by
doing echo 0  /proc/sys/net/ipv6/conf/virbr0/disable_ipv6
  
  Please find the below attached patch
 
 We also need update related code in kvm_test_utils.py, and consider
 the difference of 'ping' and 'ping6'.


ping6 test again calls same ping, and enables ipv6.
so we dont need to make any changes in kvm_test_utils.py for
ping6.

 
  Signed-off-by: Pradeep K Surisetty psuri...@linux.vnet.ibm.com
  ---
  --- autotest/client/tests/kvm/tests/ping.py 2010-10-14
  14:20:52.523791118 +0530 +++
  autotest_new/client/tests/kvm/tests/ping.py 2010-10-14
  14:46:57.711797139 +0530 @@ -1,5 +1,6 @@ -import logging
  +import logging, time
   from autotest_lib.client.common_lib import error
  +from autotest_lib.client.bin import utils
   import kvm_test_utils
   
   
  @@ -27,10 +28,18 @@ def run_ping(test, params, env):
   nics = params.get(nics).split()
   strict_check = params.get(strict_check, no) == yes
   
  +address_type = params.get(address_type)
  +#By default IPv6 seems to be disabled on virbr0. 
  +ipv6_cmd = echo %s
   /proc/sys/net/ipv6/conf/virbr0/disable_ipv6
 
 We may use other bridge, so 'virbr0', need replace this hardcode name.
 We can reference to
 'autotest-upstream/client/tests/kvm/scripts/qemu-ifup'
 switch=$(/usr/sbin/brctl show | awk 'NR==2 { print $1 }')
 
 
  +
   packet_size = [0, 1, 4, 48, 512, 1440, 1500, 1505, 4054, 4055,
  4096, 4192, 8878, 9000, 32767, 65507]
   
   try:
  +if address_type == ipv6:
  +utils.run(ipv6_cmd % 0 )
  +time.sleep(5)
  +
   for i, nic in enumerate(nics):
   ip = vm.get_address(i)
   if not ip:
  @@ -68,5 +77,9 @@ def run_ping(test, params, env):
   if status != 0:
   raise error.TestFail(Ping returns non-zero
  value %s % output)
  +if address_type == ipv6:
  +utils.run(ipv6_cmd % 1 )
  +time.sleep(5)
  +
   finally:
   session.close()
  ---
 ___
 Autotest mailing list
 autot...@test.kernel.org
 http://test.kernel.org/cgi-bin/mailman/listinfo/autotest

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] trace exit to userspace event

2010-10-14 Thread Avi Kivity


 On 10/14/2010 12:29 PM, Gleb Natapov wrote:

On Thu, Oct 14, 2010 at 12:27:15PM +0200, Avi Kivity wrote:
   On 10/10/2010 05:46 PM, Gleb Natapov wrote:
  
 We should log both errno and exit_reason.  If we want to be clever,
 we can display strerror(errno) if it's nonzero, and exit_reason
 otherwise (easy to do in a trace-cmd plugin).
  
  For starters we should remove KVM_EXIT_INTR exit reason. Looking into
  qemu-kvm history it was never used and there is at least one code path
  that returns -EINTR and does not set KVM_EXIT_INTR, so exit_reason field
  contains stale info on exit.
  

  The two issues are unrelated.

So what do you propose? I see no issue with my original patch.


Record both errno and exit_reason.  While they're never both valid at 
the same time, they're both necessary.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] trace exit to userspace event

2010-10-14 Thread Gleb Natapov

On Thu, Oct 14, 2010 at 01:11:20PM +0200, Avi Kivity wrote:
  On 10/14/2010 12:29 PM, Gleb Natapov wrote:
 On Thu, Oct 14, 2010 at 12:27:15PM +0200, Avi Kivity wrote:
On 10/10/2010 05:46 PM, Gleb Natapov wrote:
   
  We should log both errno and exit_reason.  If we want to be clever,
  we can display strerror(errno) if it's nonzero, and exit_reason
  otherwise (easy to do in a trace-cmd plugin).
   
   For starters we should remove KVM_EXIT_INTR exit reason. Looking into
   qemu-kvm history it was never used and there is at least one code path
   that returns -EINTR and does not set KVM_EXIT_INTR, so exit_reason field
   contains stale info on exit.
   
 
   The two issues are unrelated.
 
 So what do you propose? I see no issue with my original patch.
 
 Record both errno and exit_reason.  While they're never both valid
 at the same time, they're both necessary.
 
If they can't be valid at the same time, why not record one of them? The
one that happened? Also any error other then -EINTR will cause qemu to
stop, so it is not very interesting. And ioctl return value can be
traced by strace anyway.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] trace exit to userspace event

2010-10-14 Thread Avi Kivity


 On 10/14/2010 01:28 PM, Gleb Natapov wrote:

On Thu, Oct 14, 2010 at 01:11:20PM +0200, Avi Kivity wrote:
   On 10/14/2010 12:29 PM, Gleb Natapov wrote:
  On Thu, Oct 14, 2010 at 12:27:15PM +0200, Avi Kivity wrote:
  On 10/10/2010 05:46 PM, Gleb Natapov wrote:
 
 We should log both errno and exit_reason.  If we want to be 
clever,
 we can display strerror(errno) if it's nonzero, and exit_reason
 otherwise (easy to do in a trace-cmd plugin).
 
 For starters we should remove KVM_EXIT_INTR exit reason. Looking into
 qemu-kvm history it was never used and there is at least one code path
 that returns -EINTR and does not set KVM_EXIT_INTR, so exit_reason 
field
 contains stale info on exit.
 
  
 The two issues are unrelated.
  
  So what do you propose? I see no issue with my original patch.

  Record both errno and exit_reason.  While they're never both valid
  at the same time, they're both necessary.

If they can't be valid at the same time, why not record one of them?


If you record just one, you don't know if the other one happened.


The
one that happened? Also any error other then -EINTR will cause qemu to
stop, so it is not very interesting. And ioctl return value can be
traced by strace anyway.


You can't correlate it with ftrace.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] trace exit to userspace event

2010-10-14 Thread Gleb Natapov

On Thu, Oct 14, 2010 at 01:32:14PM +0200, Avi Kivity wrote:
  On 10/14/2010 01:28 PM, Gleb Natapov wrote:
 On Thu, Oct 14, 2010 at 01:11:20PM +0200, Avi Kivity wrote:
On 10/14/2010 12:29 PM, Gleb Natapov wrote:
   On Thu, Oct 14, 2010 at 12:27:15PM +0200, Avi Kivity wrote:
   On 10/10/2010 05:46 PM, Gleb Natapov wrote:
  
  We should log both errno and exit_reason.  If we want to be 
  clever,
  we can display strerror(errno) if it's nonzero, and exit_reason
  otherwise (easy to do in a trace-cmd plugin).
  
  For starters we should remove KVM_EXIT_INTR exit reason. Looking 
  into
  qemu-kvm history it was never used and there is at least one code 
  path
  that returns -EINTR and does not set KVM_EXIT_INTR, so exit_reason 
  field
  contains stale info on exit.
  
   
  The two issues are unrelated.
   
   So what do you propose? I see no issue with my original patch.
 
   Record both errno and exit_reason.  While they're never both valid
   at the same time, they're both necessary.
 
 If they can't be valid at the same time, why not record one of them?
 
 If you record just one, you don't know if the other one happened.
 
I mean to record type/value. So it can be (type error/value -EINVAL) or
(type exit/value HALT). The goal is to not print non-relevant info in
ftrace, but we can do the same with recording errno and exit_reason and
show only exit_reason is errno=0 or errno otherwise.

 The
 one that happened? Also any error other then -EINTR will cause qemu to
 stop, so it is not very interesting. And ioctl return value can be
 traced by strace anyway.
 
 You can't correlate it with ftrace.
 
True. But given that the only interesting error code in -EINTR I do not
know if this is useful, but potentially may generate a lot of events in
ftrace. Sometimes too much info is almost as bad as not enough and if we
will use the same event for both exit_reason and errno it will be
impossible to enable one without the other.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] trace exit to userspace event

2010-10-14 Thread Gleb Natapov

On Thu, Oct 14, 2010 at 01:43:52PM +0200, Avi Kivity wrote:
  On 10/14/2010 01:41 PM, Gleb Natapov wrote:
  So what do you propose? I see no issue with my original patch.
   
  Record both errno and exit_reason.  While they're never both valid
  at the same time, they're both necessary.
   
   If they can't be valid at the same time, why not record one of them?
 
   If you record just one, you don't know if the other one happened.
 
 I mean to record type/value. So it can be (type error/value -EINVAL) or
 (type exit/value HALT). The goal is to not print non-relevant info in
 ftrace, but we can do the same with recording errno and exit_reason and
 show only exit_reason is errno=0 or errno otherwise.
 
 That's fine.  As long as you don't drop information.
 
   The
   one that happened? Also any error other then -EINTR will cause qemu to
   stop, so it is not very interesting. And ioctl return value can be
   traced by strace anyway.
 
   You can't correlate it with ftrace.
 
 True. But given that the only interesting error code in -EINTR I do not
 know if this is useful, but potentially may generate a lot of events in
 ftrace. Sometimes too much info is almost as bad as not enough and if we
 will use the same event for both exit_reason and errno it will be
 impossible to enable one without the other.
 
 You can always filter excess information away.  If you're looking
 for exits to userspace (a major performance issue) then you want to
 see both normal exits and signal exits.
 
How do I do it via /debug file system? Now I just echo name of the event
I am interested in set_event, how do I add filter to that?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: Fix signature of kvm_iommu_map_pages stub

2010-10-14 Thread Jan Kiszka

Breaks otherwise if CONFIG_IOMMU_API is not set.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 include/linux/kvm_host.h |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0b89d00..866ed30 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -483,8 +483,7 @@ int kvm_deassign_device(struct kvm *kvm,
struct kvm_assigned_dev_kernel *assigned_dev);
 #else /* CONFIG_IOMMU_API */
 static inline int kvm_iommu_map_pages(struct kvm *kvm,
- gfn_t base_gfn,
- unsigned long npages)
+ struct kvm_memory_slot *slot)
 {
return 0;
 }
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Frame buffer corruptions with KVM = 2.6.36

2010-10-14 Thread Avi Kivity


 On 10/14/2010 09:27 AM, Jan Kiszka wrote:

Hi,

I'm seeing quite frequent corruptions of the VESA frame buffer with
Linux guests (vga=0x317) that are starting with KVM kernel modules of
upcoming 2.6.36 (I'm currently running -rc7). Effects disappears when
downgrading to kvm-kmod-2.6.35.6. Will see if I can bisect later, but
maybe someone already has an idea or wants to reproduce (just run
something like find / on one text console and witch to another one -
text fragments will remain on the screen on every few switches).



Reproduces on kvm.git.  I wonder what's going on.

Looks like vesafb uses the bios to switch the display start, so I expect 
a problem in qemu reacting to this.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Avi Kivity


 On 10/14/2010 12:54 AM, Anthony Liguori wrote:

On 10/13/2010 05:32 PM, Anjali Kulkarni wrote:


Hi,

Using the legacy way of starting up NICs, I am hitting a limitation 
after 29

NICs ie no more than 29 are detected (that's because of the 32 PCI slot
limit on a single bus- 3 are already taken up)
I had initially increased the MAX_NICS to 48, just on my tree, to get to
more, but ofcource that wont work.
Is there any way to go beyond 29 NICs the legacy way?  What is the 
maximum

that can be supported by the qdev mothod?


I got up to 104 without trying very hard using the following script:

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
args=$args -netdev user,id=eth${slot}_${fn}
args=$args -device 
virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile=

done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} 
-enable-kvm


The key is to make the virtio-net devices multifunction and to fill 
out all 8 functions for each slot.


This is unlikely to work right wrt pci hotplug.  If we want to support a 
large number of interfaces, we need true multiport cards.


What's the motivation for such a huge number of interfaces?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] trace exit to userspace event

2010-10-14 Thread Avi Kivity


 On 10/14/2010 01:47 PM, Gleb Natapov wrote:


  You can always filter excess information away.  If you're looking
  for exits to userspace (a major performance issue) then you want to
  see both normal exits and signal exits.

How do I do it via /debug file system? Now I just echo name of the event
I am interested in set_event, how do I add filter to that?


echo exit_reason==33  /sys/kernel/debug/tracing/events/kvm/kvm_exit/filter

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] trace exit to userspace event

2010-10-14 Thread Avi Kivity


 On 10/14/2010 02:09 PM, Avi Kivity wrote:

 On 10/14/2010 01:47 PM, Gleb Natapov wrote:


  You can always filter excess information away.  If you're looking
  for exits to userspace (a major performance issue) then you want to
  see both normal exits and signal exits.

How do I do it via /debug file system? Now I just echo name of the event
I am interested in set_event, how do I add filter to that?


echo exit_reason==33  
/sys/kernel/debug/tracing/events/kvm/kvm_exit/filter




Or, easier,

  trace-cmd record -e kvm:kvm_exit -f exit_reason==33

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Frame buffer corruptions with KVM = 2.6.36

2010-10-14 Thread Jan Kiszka

Am 14.10.2010 09:27, Jan Kiszka wrote:
 Hi,
 
 I'm seeing quite frequent corruptions of the VESA frame buffer with
 Linux guests (vga=0x317) that are starting with KVM kernel modules of
 upcoming 2.6.36 (I'm currently running -rc7). Effects disappears when
 downgrading to kvm-kmod-2.6.35.6. Will see if I can bisect later, but
 maybe someone already has an idea or wants to reproduce (just run
 something like find / on one text console and witch to another one -
 text fragments will remain on the screen on every few switches).

Commit d25f31f488e5f7597c17a3ac7d82074de8138e3b in kvm.git (KVM: x86:
avoid unnecessary bitmap allocation when memslot is clean) is at least
magnifying the issue. With this patch applied, I can easily trigger
display corruptions when switching between VGA consoles while one of
them is undergoing heavy updates.

However, I once saw a much smaller inconsistency during my tests even
with a previous revision. Maybe there is a fundamental issue in when and
how the coalesced backlog is replayed, and this commit just makes the
corruptions more likely. This may even be a QEMU issue in the cirrus/vga
model (both qemu-kvm and upstream show the effect).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Frame buffer corruptions with KVM = 2.6.36

2010-10-14 Thread Avi Kivity


 On 10/14/2010 02:04 PM, Avi Kivity wrote:

 On 10/14/2010 09:27 AM, Jan Kiszka wrote:

Hi,

I'm seeing quite frequent corruptions of the VESA frame buffer with
Linux guests (vga=0x317) that are starting with KVM kernel modules of
upcoming 2.6.36 (I'm currently running -rc7). Effects disappears when
downgrading to kvm-kmod-2.6.35.6. Will see if I can bisect later, but
maybe someone already has an idea or wants to reproduce (just run
something like find / on one text console and witch to another one -
text fragments will remain on the screen on every few switches).



Reproduces on kvm.git.  I wonder what's going on.

Looks like vesafb uses the bios to switch the display start, so I 
expect a problem in qemu reacting to this.




Hm, you said it is a kernel regression.  Maybe it's an issue with dirty 
bit tracking.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Daniel P. Berrange

On Thu, Oct 14, 2010 at 02:07:17PM +0200, Avi Kivity wrote:
  On 10/14/2010 12:54 AM, Anthony Liguori wrote:
 On 10/13/2010 05:32 PM, Anjali Kulkarni wrote:
 
 Hi,
 
 Using the legacy way of starting up NICs, I am hitting a limitation 
 after 29
 NICs ie no more than 29 are detected (that's because of the 32 PCI slot
 limit on a single bus- 3 are already taken up)
 I had initially increased the MAX_NICS to 48, just on my tree, to get to
 more, but ofcource that wont work.
 Is there any way to go beyond 29 NICs the legacy way?  What is the 
 maximum
 that can be supported by the qdev mothod?
 
 I got up to 104 without trying very hard using the following script:
 
 args=
 for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
 for fn in 0 1 2 3 4 5 6 7; do
 args=$args -netdev user,id=eth${slot}_${fn}
 args=$args -device 
 virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile=
 done
 done
 
 x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} 
 -enable-kvm
 
 The key is to make the virtio-net devices multifunction and to fill 
 out all 8 functions for each slot.
 
 This is unlikely to work right wrt pci hotplug.  If we want to support a 
 large number of interfaces, we need true multiport cards.

Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
any type of device we emulate.

Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Frame buffer corruptions with KVM = 2.6.36

2010-10-14 Thread Jan Kiszka

Am 14.10.2010 14:11, Avi Kivity wrote:
  On 10/14/2010 02:04 PM, Avi Kivity wrote:
  On 10/14/2010 09:27 AM, Jan Kiszka wrote:
 Hi,

 I'm seeing quite frequent corruptions of the VESA frame buffer with
 Linux guests (vga=0x317) that are starting with KVM kernel modules of
 upcoming 2.6.36 (I'm currently running -rc7). Effects disappears when
 downgrading to kvm-kmod-2.6.35.6. Will see if I can bisect later, but
 maybe someone already has an idea or wants to reproduce (just run
 something like find / on one text console and witch to another one -
 text fragments will remain on the screen on every few switches).


 Reproduces on kvm.git.  I wonder what's going on.

 Looks like vesafb uses the bios to switch the display start, so I
 expect a problem in qemu reacting to this.

 
 Hm, you said it is a kernel regression.  Maybe it's an issue with dirty
 bit tracking.
 

Ah, cross-posting...

It need not be a kernel thing, see my other mail.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Frame buffer corruptions with KVM = 2.6.36

2010-10-14 Thread Avi Kivity


 On 10/14/2010 02:10 PM, Jan Kiszka wrote:

Am 14.10.2010 09:27, Jan Kiszka wrote:
  Hi,

  I'm seeing quite frequent corruptions of the VESA frame buffer with
  Linux guests (vga=0x317) that are starting with KVM kernel modules of
  upcoming 2.6.36 (I'm currently running -rc7). Effects disappears when
  downgrading to kvm-kmod-2.6.35.6. Will see if I can bisect later, but
  maybe someone already has an idea or wants to reproduce (just run
  something like find / on one text console and witch to another one -
  text fragments will remain on the screen on every few switches).

Commit d25f31f488e5f7597c17a3ac7d82074de8138e3b in kvm.git (KVM: x86:
avoid unnecessary bitmap allocation when memslot is clean) is at least
magnifying the issue. With this patch applied, I can easily trigger
display corruptions when switching between VGA consoles while one of
them is undergoing heavy updates.

However, I once saw a much smaller inconsistency during my tests even
with a previous revision. Maybe there is a fundamental issue in when and
how the coalesced backlog is replayed,


I didn't see any mmio writes to the framebuffer, so I don't think 
coalescing plays a part here.



and this commit just makes the
corruptions more likely. This may even be a QEMU issue in the cirrus/vga
model (both qemu-kvm and upstream show the effect).



What about -no-kvm?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-14 Thread Krishna Kumar2

Krishna Kumar2/India/IBM wrote on 10/14/2010 02:34:01 PM:

 void vhost_poll_queue(struct vhost_poll *poll)
 {
 struct vhost_virtqueue *vq = vhost_find_vq(poll);

 vhost_work_queue(vq, poll-work);
 }

 Since poll batches packets, find_vq does not seem to add much
 to the CPU utilization (or BW). I am sure that code can be
 optimized much better.

 The results I sent in my last mail were without your use_mm
 patch, and the only tuning was to make vhost threads run on
 only cpus 0-3 (though the performance is good even without
 that). I will test it later today with the use_mm patch too.

There's a significant reduction in CPU/SD utilization with your
patch. Following is the performance of ORG vs MQ+mm patch:

_
   Org vs MQ+mm patch txq=2
# BW% CPU/RCPU% SD/RSD%
_
1 2.26-1.16.27  -20.00  0
2 35.07   29.9021.81 0  -11.11
4 55.03   84.5737.66 26.92  -4.62
8 73.16   118.69   49.21 45.63  -.46
1677.43   98.8147.89 24.07  -7.80
2471.59   105.18   48.44 62.84  18.18
3270.91   102.38   47.15 49.22  8.54
4063.26   90.5841.00 85.27  37.33
4845.25   45.9911.23 14.31  -12.91
6442.78   41.825.50  .43-25.12
8031.40   7.31 -18.6915.78  -11.93
9627.60   7.79 -18.5417.39  -10.98
128   23.46   -11.89   -34.41-.41   -25.53
_
BW: 40.2  CPU/RCPU: 29.9,-2.2   SD/RSD: 12.0,-15.6


Following is the performance of MQ vs MQ+mm patch:
_
MQ vs MQ+mm patch
# BW%  CPU%   RCPU%SD%  RSD%
_
1  4.98-.58   .84  -20.000
2  5.17 2.96  2.29  0   -4.00
4 -.18  .25  -.16   3.12 .98
8 -5.47-1.36 -1.98  17.1816.57
16-1.90-6.64 -3.54 -14.83   -12.12
24-.01  23.63 14.65 57.6146.64
32 .27 -3.19  -3.11-22.98   -22.91
40-1.06-2.96  -2.96-4.18-4.10
48-.28 -2.34  -3.71-2.41-3.81
64 9.71 33.77  30.6581.4477.09
80-10.69-31.07-31.70   -29.22   -29.88
96-1.14 5.98   .56 -11.57   -16.14
128   -.93 -15.60 -18.31   -19.89   -22.65
_
  BW: 0   CPU/RCPU: -4.2,-6.1  SD/RSD: -13.1,-15.6
_

Each test case is for 60 secs, sum over two runs (except
when number of netperf sessions is 1, which has 7 runs
of 10 secs each), numcpus=4, numtxqs=8, etc. No tuning
other than taskset each vhost to cpus 0-3.

Thanks,

- KK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Frame buffer corruptions with KVM = 2.6.36

2010-10-14 Thread Avi Kivity


 On 10/14/2010 02:13 PM, Avi Kivity wrote:



and this commit just makes the
corruptions more likely. This may even be a QEMU issue in the cirrus/vga
model (both qemu-kvm and upstream show the effect).



What about -no-kvm?



Doesn't happen there.

My guess is the initial dirty log after the switch does not contain all 
1s so memory isn't updated.  Probably need to force a full redraw in 
that case.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Frame buffer corruptions with KVM = 2.6.36

2010-10-14 Thread Jan Kiszka

Am 14.10.2010 14:13, Avi Kivity wrote:
   On 10/14/2010 02:10 PM, Jan Kiszka wrote:
 Am 14.10.2010 09:27, Jan Kiszka wrote:
  Hi,

  I'm seeing quite frequent corruptions of the VESA frame buffer with
  Linux guests (vga=0x317) that are starting with KVM kernel modules of
  upcoming 2.6.36 (I'm currently running -rc7). Effects disappears when
  downgrading to kvm-kmod-2.6.35.6. Will see if I can bisect later, but
  maybe someone already has an idea or wants to reproduce (just run
  something like find / on one text console and witch to another one -
  text fragments will remain on the screen on every few switches).

 Commit d25f31f488e5f7597c17a3ac7d82074de8138e3b in kvm.git (KVM: x86:
 avoid unnecessary bitmap allocation when memslot is clean) is at least
 magnifying the issue. With this patch applied, I can easily trigger
 display corruptions when switching between VGA consoles while one of
 them is undergoing heavy updates.

 However, I once saw a much smaller inconsistency during my tests even
 with a previous revision. Maybe there is a fundamental issue in when and
 how the coalesced backlog is replayed,
 
 I didn't see any mmio writes to the framebuffer, so I don't think 
 coalescing plays a part here.
 
 and this commit just makes the
 corruptions more likely. This may even be a QEMU issue in the cirrus/vga
 model (both qemu-kvm and upstream show the effect).

 
 What about -no-kvm?

Just booted it (took ages), and the result was actually a completely
black screen. Kind of persistent corruption. This really looks like a
qemu issue now, maybe even a regression as I don't remember running into
such effects a while back.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Markus Armbruster

Avi Kivity a...@redhat.com writes:

  On 10/14/2010 12:54 AM, Anthony Liguori wrote:
 On 10/13/2010 05:32 PM, Anjali Kulkarni wrote:

 Hi,

 Using the legacy way of starting up NICs, I am hitting a limitation
 after 29
 NICs ie no more than 29 are detected (that's because of the 32 PCI slot
 limit on a single bus- 3 are already taken up)
 I had initially increased the MAX_NICS to 48, just on my tree, to get to
 more, but ofcource that wont work.
 Is there any way to go beyond 29 NICs the legacy way?  What is the
 maximum
 that can be supported by the qdev mothod?

 I got up to 104 without trying very hard using the following script:

 args=
 for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
 for fn in 0 1 2 3 4 5 6 7; do
 args=$args -netdev user,id=eth${slot}_${fn}
 args=$args -device
 virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile=
 done
 done

 x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args}
 -enable-kvm

 The key is to make the virtio-net devices multifunction and to fill
 out all 8 functions for each slot.

I'm amazed that works.  Can't see how creating another qdev in the same
slot makes a proper multifunction device.

 This is unlikely to work right wrt pci hotplug.  If we want to support
 a large number of interfaces, we need true multiport cards.

Indeed.  As far as I know, we can't hot plug multifunction PCI devices.

 What's the motivation for such a huge number of interfaces?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Frame buffer corruptions with KVM = 2.6.36

2010-10-14 Thread Avi Kivity


 On 10/14/2010 02:36 PM, Jan Kiszka wrote:


  and this commit just makes the
  corruptions more likely. This may even be a QEMU issue in the cirrus/vga
  model (both qemu-kvm and upstream show the effect).


  What about -no-kvm?

Just booted it (took ages), and the result was actually a completely
black screen. Kind of persistent corruption. This really looks like a
qemu issue now, maybe even a regression as I don't remember running into
such effects a while back.


Worked fine for me (though yes it was slow - did tcg regress?).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: Fix signature of kvm_iommu_map_pages stub

2010-10-14 Thread Jan Kiszka

Am 14.10.2010 13:59, Jan Kiszka wrote:
 Breaks otherwise if CONFIG_IOMMU_API is not set.

Actually, it only broke a special local version. It doesn't break with
current KVM due to

[__kvm_set_memory_region:]
#ifdef CONFIG_DMAR
/* map the pages in iommu page table */
if (npages) {
r = kvm_iommu_map_pages(kvm, new);
if (r)
goto out_free;
}
#endif

And CONFIG_IOMMU_API is set when CONFIG_DMAR is enabled. But do we only
need this call on Intel?

Jan

 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  include/linux/kvm_host.h |3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)
 
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index 0b89d00..866ed30 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -483,8 +483,7 @@ int kvm_deassign_device(struct kvm *kvm,
   struct kvm_assigned_dev_kernel *assigned_dev);
  #else /* CONFIG_IOMMU_API */
  static inline int kvm_iommu_map_pages(struct kvm *kvm,
 -   gfn_t base_gfn,
 -   unsigned long npages)
 +   struct kvm_memory_slot *slot)
  {
   return 0;
  }

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-14 Thread Krishna Kumar2

Krishna Kumar2/India/IBM wrote on 10/14/2010 05:47:54 PM:

Sorry, it should read txq=8 below.

- KK

 There's a significant reduction in CPU/SD utilization with your
 patch. Following is the performance of ORG vs MQ+mm patch:

 _
Org vs MQ+mm patch txq=2
 # BW% CPU/RCPU% SD/RSD%
 _
 1 2.26-1.16.27  -20.00  0
 2 35.07   29.9021.81 0  -11.11
 4 55.03   84.5737.66 26.92  -4.62
 8 73.16   118.69   49.21 45.63  -.46
 1677.43   98.8147.89 24.07  -7.80
 2471.59   105.18   48.44 62.84  18.18
 3270.91   102.38   47.15 49.22  8.54
 4063.26   90.5841.00 85.27  37.33
 4845.25   45.9911.23 14.31  -12.91
 6442.78   41.825.50  .43-25.12
 8031.40   7.31 -18.6915.78  -11.93
 9627.60   7.79 -18.5417.39  -10.98
 128   23.46   -11.89   -34.41-.41   -25.53
 _
 BW: 40.2  CPU/RCPU: 29.9,-2.2   SD/RSD: 12.0,-15.6

 Following is the performance of MQ vs MQ+mm patch:
 _
 MQ vs MQ+mm patch
 # BW%  CPU%   RCPU%SD%  RSD%
 _
 1  4.98-.58   .84  -20.000
 2  5.17 2.96  2.29  0   -4.00
 4 -.18  .25  -.16   3.12 .98
 8 -5.47-1.36 -1.98  17.1816.57
 16-1.90-6.64 -3.54 -14.83   -12.12
 24-.01  23.63 14.65 57.6146.64
 32 .27 -3.19  -3.11-22.98   -22.91
 40-1.06-2.96  -2.96-4.18-4.10
 48-.28 -2.34  -3.71-2.41-3.81
 64 9.71 33.77  30.6581.4477.09
 80-10.69-31.07-31.70   -29.22   -29.88
 96-1.14 5.98   .56 -11.57   -16.14
 128   -.93 -15.60 -18.31   -19.89   -22.65
 _
   BW: 0   CPU/RCPU: -4.2,-6.1  SD/RSD: -13.1,-15.6
 _

 Each test case is for 60 secs, sum over two runs (except
 when number of netperf sessions is 1, which has 7 runs
 of 10 secs each), numcpus=4, numtxqs=8, etc. No tuning
 other than taskset each vhost to cpus 0-3.

 Thanks,

 - KK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori


On 10/14/2010 07:07 AM, Avi Kivity wrote:

 On 10/14/2010 12:54 AM, Anthony Liguori wrote:

On 10/13/2010 05:32 PM, Anjali Kulkarni wrote:


Hi,

Using the legacy way of starting up NICs, I am hitting a limitation 
after 29

NICs ie no more than 29 are detected (that's because of the 32 PCI slot
limit on a single bus- 3 are already taken up)
I had initially increased the MAX_NICS to 48, just on my tree, to 
get to

more, but ofcource that wont work.
Is there any way to go beyond 29 NICs the legacy way?  What is the 
maximum

that can be supported by the qdev mothod?


I got up to 104 without trying very hard using the following script:

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
args=$args -netdev user,id=eth${slot}_${fn}
args=$args -device 
virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile= 


done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args} 
-enable-kvm


The key is to make the virtio-net devices multifunction and to fill 
out all 8 functions for each slot.


This is unlikely to work right wrt pci hotplug.


Yes.  Our hotplug design is based on devices..  This is wrong, it should 
be based on bus-level concepts (like PCI slots).


If we want to support a large number of interfaces, we need true 
multiport cards.


This magic here creates a multiport virtio-net card so I'm not really 
sure what you're suggesting.  It would certainly be nice to make this 
all more user friendly (and make hotplug work).


Regards,

Anthony Liguori


What's the motivation for such a huge number of interfaces?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori


On 10/14/2010 07:10 AM, Daniel P. Berrange wrote:

On Thu, Oct 14, 2010 at 02:07:17PM +0200, Avi Kivity wrote:
   

  On 10/14/2010 12:54 AM, Anthony Liguori wrote:
 

On 10/13/2010 05:32 PM, Anjali Kulkarni wrote:
   

Hi,

Using the legacy way of starting up NICs, I am hitting a limitation
after 29
NICs ie no more than 29 are detected (that's because of the 32 PCI slot
limit on a single bus- 3 are already taken up)
I had initially increased the MAX_NICS to 48, just on my tree, to get to
more, but ofcource that wont work.
Is there any way to go beyond 29 NICs the legacy way?  What is the
maximum
that can be supported by the qdev mothod?
 

I got up to 104 without trying very hard using the following script:

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
args=$args -netdev user,id=eth${slot}_${fn}
args=$args -device
virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile=
done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args}
-enable-kvm

The key is to make the virtio-net devices multifunction and to fill
out all 8 functions for each slot.
   

This is unlikely to work right wrt pci hotplug.  If we want to support a
large number of interfaces, we need true multiport cards.
 

Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
any type of device we emulate.
   


I've always been sceptical of this.  When physical systems have a large 
number of NICs, it's via multiple functions, not a bunch of PCI bridges.


With just a handful of 8-port NICs, you can exceed the current 
slot-based limit on physical hardware.  It's not an extremely common 
configuration but it does exist.


BTW, I don't think it's possible to hot-add physical functions.  I 
believe I know of a card that supports dynamic add of physical functions 
(pre-dating SR-IOV).


Regards,

Anthony Liguori


Daniel
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori


On 10/14/2010 07:36 AM, Markus Armbruster wrote:

Avi Kivitya...@redhat.com  writes:

   

  On 10/14/2010 12:54 AM, Anthony Liguori wrote:
 

On 10/13/2010 05:32 PM, Anjali Kulkarni wrote:
   

Hi,

Using the legacy way of starting up NICs, I am hitting a limitation
after 29
NICs ie no more than 29 are detected (that's because of the 32 PCI slot
limit on a single bus- 3 are already taken up)
I had initially increased the MAX_NICS to 48, just on my tree, to get to
more, but ofcource that wont work.
Is there any way to go beyond 29 NICs the legacy way?  What is the
maximum
that can be supported by the qdev mothod?
 

I got up to 104 without trying very hard using the following script:

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
 args=$args -netdev user,id=eth${slot}_${fn}
 args=$args -device
virtio-net-pci,addr=${slot}.${fn},netdev=eth${slot}_${fn},multifunction=on,romfile=
done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args}
-enable-kvm

The key is to make the virtio-net devices multifunction and to fill
out all 8 functions for each slot.
   

I'm amazed that works.  Can't see how creating another qdev in the same
slot makes a proper multifunction device.
   


multifunction=on sets the multifunction bit for the PCI device.  Then 
it's a matter of setting the address to be a specific function.


Our default platform devices are actually multifunction.


This is unlikely to work right wrt pci hotplug.  If we want to support
a large number of interfaces, we need true multiport cards.
 

Indeed.  As far as I know, we can't hot plug multifunction PCI devices.
   


Yup.

Regards,

Anthony Liguori


What's the motivation for such a huge number of interfaces?
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Avi Kivity


 On 10/14/2010 02:54 PM, Anthony Liguori wrote:
The key is to make the virtio-net devices multifunction and to fill 
out all 8 functions for each slot.


This is unlikely to work right wrt pci hotplug.



Yes.  Our hotplug design is based on devices..  This is wrong, it 
should be based on bus-level concepts (like PCI slots).


If we want to support a large number of interfaces, we need true 
multiport cards.


This magic here creates a multiport virtio-net card so I'm not really 
sure what you're suggesting.  It would certainly be nice to make this 
all more user friendly (and make hotplug work).




The big issue is to fix hotplug.

I don't see how we can make it user friendly, without making the 
ordinary case even more unfriendly.  Looks like we need yet another 
level of indirection here.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [RFC] Add support for a USB audio device model

2010-10-14 Thread Mike Snitzer

On Tue, Sep 14, 2010 at 1:56 AM, H. Peter Anvin h...@zytor.com wrote:
 On 09/13/2010 06:37 PM, Amos Kong wrote:
 I've heard wonderful music (guest:win7), but mixed with a litte noise, not 
 so fluent.
 The following debug msg is normal?

 Yes, all of that is normal.  I talked to malc earlier today, and I think
 I have a pretty good idea for how to deal with the rate-matching issues;
 I'm going to try to write it up tomorrow.

Hi,

Was just wondering if you've been able to put some time to the
rate-matching issues?

Has this usb-audio patch evolved and I'm just missing it?

Thanks for doing this work!
Mike
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori


On 10/14/2010 08:23 AM, Avi Kivity wrote:

 On 10/14/2010 02:54 PM, Anthony Liguori wrote:
The key is to make the virtio-net devices multifunction and to fill 
out all 8 functions for each slot.


This is unlikely to work right wrt pci hotplug.



Yes.  Our hotplug design is based on devices..  This is wrong, it 
should be based on bus-level concepts (like PCI slots).


If we want to support a large number of interfaces, we need true 
multiport cards.


This magic here creates a multiport virtio-net card so I'm not really 
sure what you're suggesting.  It would certainly be nice to make this 
all more user friendly (and make hotplug work).




The big issue is to fix hotplug.


Yes, but this is entirely independent of multifunction devices.

Today we shoe-horn hot remove into device_del.  Instead, we should have 
explicit bus-level interfaces for hot remove.


Regards,

Anthony Liguori

I don't see how we can make it user friendly, without making the 
ordinary case even more unfriendly.  Looks like we need yet another 
level of indirection here.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Avi Kivity


 On 10/14/2010 04:11 PM, Anthony Liguori wrote:

On 10/14/2010 08:23 AM, Avi Kivity wrote:

 On 10/14/2010 02:54 PM, Anthony Liguori wrote:
The key is to make the virtio-net devices multifunction and to 
fill out all 8 functions for each slot.


This is unlikely to work right wrt pci hotplug.



Yes.  Our hotplug design is based on devices..  This is wrong, it 
should be based on bus-level concepts (like PCI slots).


If we want to support a large number of interfaces, we need true 
multiport cards.


This magic here creates a multiport virtio-net card so I'm not 
really sure what you're suggesting.  It would certainly be nice to 
make this all more user friendly (and make hotplug work).




The big issue is to fix hotplug.


Yes, but this is entirely independent of multifunction devices.

Today we shoe-horn hot remove into device_del.  Instead, we should 
have explicit bus-level interfaces for hot remove.


I'm not saying multiplug is not the right way to approach this (it is).  
The only concern is to get hotplug right.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [RFC] Add support for a USB audio device model

2010-10-14 Thread H. Peter Anvin

On 10/14/2010 06:51 AM, Mike Snitzer wrote:
 
 Was just wondering if you've been able to put some time to the
 rate-matching issues?
 
 Has this usb-audio patch evolved and I'm just missing it?
 
 Thanks for doing this work!
 Mike
 

The sad result really is: it doesn't work, and it probably will never work.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL 0/3] PPC BookE patches

2010-10-14 Thread Avi Kivity


 On 10/10/2010 12:44 PM, Alexander Graf wrote:

Hi Marcelo / Avi,

Scott has found some bugs in the BookE implementation of KVM, so please pull
the fixes for them to the kvm tree.

The following changes since commit 3c4504636ab1ff41ec162980bf85121aee14e58f:
   Huang Ying (1):
 KVM: MCE: Send SRAR SIGBUS directly

are available in the git repository at:

   git://github.com/agraf/linux-2.6.git kvm-ppc-next

Scott Wood (3):
   KVM: PPC: BookE: Load the lower half of MSR
   KVM: PPC: BookE: fix sleep with interrupts disabled
   KVM: PPC: e500: Call kvm_vcpu_uninit() before kvmppc_e500_tlb_uninit().



Pulled, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH] [RFC] Add support for a USB audio device model

2010-10-14 Thread Alon Levy


- H. Peter Anvin h...@zytor.com wrote:

 On 10/14/2010 06:51 AM, Mike Snitzer wrote:
  
  Was just wondering if you've been able to put some time to the
  rate-matching issues?
  
  Has this usb-audio patch evolved and I'm just missing it?
  
  Thanks for doing this work!
  Mike
  
 
 The sad result really is: it doesn't work, and it probably will never
 work.
 

Can you elaborate?

   -hpa
 
 -- 
 H. Peter Anvin, Intel Open Source Technology Center
 I work for Intel.  I don't speak on their behalf.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] port qemu-kvm's MCE support (v3)

2010-10-14 Thread Marcelo Tosatti

On Thu, Oct 14, 2010 at 12:25:34PM +0200, Avi Kivity wrote:
  On 10/11/2010 08:31 PM, Marcelo Tosatti wrote:
 Port qemu-kvm's KVM MCE (Machine Check Exception) handling to qemu. It
 allows qemu to propagate MCEs to the guest.
 
 v2:
 - rename do_qemu_ram_addr_from_host.
 - fix kvm_on_sigbus/kvm_on_sigbus_vcpu naming.
 - fix bank register restoration (Dean Nelson).
 
 v3:
 - condition MCE generation on MCE_SEG_P bit (Huang Ying).
 
 
 I only see patches 1 and 4 from v2, and this cover letter from v3.
 Please repost.

Done.

 Also, if the patchset ends up with qemu-kvm master being different
 from uq/master in this area, please post the corresponding qemu-kvm
 master patches.

Nope. I'll fix it up on the next qemu merge.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH] [RFC] Add support for a USB audio device model

2010-10-14 Thread H. Peter Anvin

On 10/14/2010 09:18 AM, Alon Levy wrote:
 
 Can you elaborate?
 

The quality of rate information is too low, and the delays in the system
are too large to enable consistent convergence.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] pc: e820 qemu_cfg tables need to be packed

2010-10-14 Thread Alex Williamson

We can't let the compiler define the alignment for qemu_cfg data.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

0.13 stable candidate?

 hw/pc.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 69b13bf..90839bd 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -75,12 +75,12 @@ struct e820_entry {
 uint64_t address;
 uint64_t length;
 uint32_t type;
-};
+} __attribute__((__packed__));
 
 struct e820_table {
 uint32_t count;
 struct e820_entry entry[E820_NR_ENTRIES];
-};
+} __attribute__((__packed__));
 
 static struct e820_table e820_table;
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] vga: Mark VBE area as reserved in e820 tables

2010-10-14 Thread Alex Williamson

Otherwise the guest might try to use the range for device hotplug.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/vga.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/hw/vga.c b/hw/vga.c
index 966185e..90f9dc0 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -2331,6 +2331,14 @@ void vga_init(VGACommonState *s)
 void vga_init_vbe(VGACommonState *s)
 {
 #ifdef CONFIG_BOCHS_VBE
+#if defined (TARGET_I386)
+if (e820_add_entry(VBE_DISPI_LFB_PHYSICAL_ADDRESS,
+   VGA_RAM_SIZE, E820_RESERVED)  0) {
+fprintf(stderr,
+Warning: unable to register VBE range as e820 reserved\n);
+}
+#endif
+
 /* XXX: use optimized standard vga accesses */
 cpu_register_physical_memory(VBE_DISPI_LFB_PHYSICAL_ADDRESS,
  VGA_RAM_SIZE, s-vram_offset);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pc: e820 qemu_cfg tables need to be packed

2010-10-14 Thread Jes Sorensen

On 10/14/10 20:33, Alex Williamson wrote:
 We can't let the compiler define the alignment for qemu_cfg data.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
 0.13 stable candidate?

ACK I would say so.

Jes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH] pc: e820 qemu_cfg tables need to be packed

2010-10-14 Thread Anthony Liguori


On 10/14/2010 02:44 PM, Jes Sorensen wrote:

On 10/14/10 20:33, Alex Williamson wrote:
   

We can't let the compiler define the alignment for qemu_cfg data.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---

0.13 stable candidate?
 

ACK I would say so.
   


fw_cfg interfaces are somewhat difficult to rationalize about for 
compatibility.


0.13.0 is tagged already so it's too late to pull it in there.  If we 
say we don't care about compatibility at the fw_cfg level, then it 
doesn't matter if we pull it into stable-0.13.  If we do care, then this 
is an ABI breaker.


I don't know that the answer is obvious to me.

Regards,

Anthony Liguori


Jes

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH] pc: e820 qemu_cfg tables need to be packed

2010-10-14 Thread Alex Williamson

On Thu, 2010-10-14 at 14:48 -0500, Anthony Liguori wrote:
 On 10/14/2010 02:44 PM, Jes Sorensen wrote:
  On 10/14/10 20:33, Alex Williamson wrote:
 
  We can't let the compiler define the alignment for qemu_cfg data.
 
  Signed-off-by: Alex Williamsonalex.william...@redhat.com
  ---
 
  0.13 stable candidate?
   
  ACK I would say so.
 
 
 fw_cfg interfaces are somewhat difficult to rationalize about for 
 compatibility.
 
 0.13.0 is tagged already so it's too late to pull it in there.  If we 
 say we don't care about compatibility at the fw_cfg level, then it 
 doesn't matter if we pull it into stable-0.13.  If we do care, then this 
 is an ABI breaker.

If it works anywhere (I assume it works on 32bit), then it's only
because it happened to get the alignment right.  This just makes 64bit
hosts get it right too.  I don't see any compatibility issues,
non-packed + 64bit = broken.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH] pc: e820 qemu_cfg tables need to be packed

2010-10-14 Thread Anthony Liguori


On 10/14/2010 02:58 PM, Alex Williamson wrote:

On Thu, 2010-10-14 at 14:48 -0500, Anthony Liguori wrote:
   

On 10/14/2010 02:44 PM, Jes Sorensen wrote:
 

On 10/14/10 20:33, Alex Williamson wrote:

   

We can't let the compiler define the alignment for qemu_cfg data.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---

0.13 stable candidate?

 

ACK I would say so.

   

fw_cfg interfaces are somewhat difficult to rationalize about for
compatibility.

0.13.0 is tagged already so it's too late to pull it in there.  If we
say we don't care about compatibility at the fw_cfg level, then it
doesn't matter if we pull it into stable-0.13.  If we do care, then this
is an ABI breaker.
 

If it works anywhere (I assume it works on 32bit), then it's only
because it happened to get the alignment right.  This just makes 64bit
hosts get it right too.  I don't see any compatibility issues,
non-packed + 64bit = broken.  Thanks,
   


Ok, I'll buy that argument :-)

Regards,

Anthony Liguori


Alex

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH] pc: e820 qemu_cfg tables need to be packed

2010-10-14 Thread Arnd Bergmann

On Thursday 14 October 2010 21:58:08 Alex Williamson wrote:
 If it works anywhere (I assume it works on 32bit), then it's only
 because it happened to get the alignment right.  This just makes 64bit
 hosts get it right too.  I don't see any compatibility issues,
 non-packed + 64bit = broken.  Thanks,

I would actually assume that only x86-32 hosts got it right, because
all 32 bit hosts I've seen other than x86 also define 8 byte alignment
for uint64_t.

You might however consider making it 

__attribute((__packed__, __aligned__(4)))

instead of just packed, because otherwise you make the alignment one byte,
which is not only different from what it used to be on x86-32 but also
will cause inefficient compiler outpout on platforms that don't have unaligned
word accesses in hardware.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH] pc: e820 qemu_cfg tables need to be packed

2010-10-14 Thread Alex Williamson

On Thu, 2010-10-14 at 22:20 +0200, Arnd Bergmann wrote:
 On Thursday 14 October 2010 21:58:08 Alex Williamson wrote:
  If it works anywhere (I assume it works on 32bit), then it's only
  because it happened to get the alignment right.  This just makes 64bit
  hosts get it right too.  I don't see any compatibility issues,
  non-packed + 64bit = broken.  Thanks,
 
 I would actually assume that only x86-32 hosts got it right, because
 all 32 bit hosts I've seen other than x86 also define 8 byte alignment
 for uint64_t.
 
 You might however consider making it 
 
 __attribute((__packed__, __aligned__(4)))
 
 instead of just packed, because otherwise you make the alignment one byte,
 which is not only different from what it used to be on x86-32 but also
 will cause inefficient compiler outpout on platforms that don't have unaligned
 word accesses in hardware.

The structs in question only contain 4  8 byte elements, so there
shouldn't be any change on x86-32 using one-byte aligned packing.
AFAIK, e820 is x86-only, so we don't need to worry about breaking anyone
else.  Performance isn't much of a consideration for this type of
interface since it's only used pre-boot.  In fact, the channel between
qemu and the bios is only one byte wide, so wider alignment can cost
extra emulated I/O accesses.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH] pc: e820 qemu_cfg tables need to be packed

2010-10-14 Thread Arnd Bergmann

On Thursday 14 October 2010 22:59:04 Alex Williamson wrote:
 The structs in question only contain 4  8 byte elements, so there
 shouldn't be any change on x86-32 using one-byte aligned packing.

I'm talking about the alignment of the structure, not the members
within the structure. The data structure should be compatible, but
not accesses to it.

 AFAIK, e820 is x86-only, so we don't need to worry about breaking anyone
 else.

You can use qemu to emulate an x86 pc on anything...

 Performance isn't much of a consideration for this type of
 interface since it's only used pre-boot.  In fact, the channel between
 qemu and the bios is only one byte wide, so wider alignment can cost
 extra emulated I/O accesses.

Right, the data gets passed as bytes, so it hardly matters in the end.
Still the e820_add_entry assigns data to the struct members, which
it either does using byte accesses and shifts or a multiple 32 bit
assignment. Just because using a one byte alignment technically
results in correct output doesn't make it the right solution.

I don't care about the few cycles of execution time or the few bytes
you waste in this particular case, but you are setting a wrong example
by using smaller alignment than necessary.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

TODO item: guest programmable mac/vlan filtering with macvtap

2010-10-14 Thread Dragos Tatulea

Hi,

I'm starting a  thread related to the TODO item mentioned in the
subject. Currently still gathering info and trying to make kvm 
macvtap play nicely together. I have used this [1] guide to set it up
but qemu is still complaining about the PCI device address of the
virtio-net-pci. Tried with latest qemu. Am I missing something here?

[1] - http://virt.kernelnewbies.org/MacVTap

-- Dragos
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Richard W.M. Jones


On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote:
 Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
 any type of device we emulate.

Break the 29/30/31 virtio-blk limit ... please!

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anjali Kulkarni

Can you send me pointers to the qdev documentation? How can I use it? Will
it allow us to scale above the 32 PCI limit?

Anjali


On 10/14/10 2:57 PM, Anthony Liguori anth...@codemonkey.ws wrote:

 On 10/14/2010 04:42 PM, Richard W.M. Jones wrote:
 On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote:

 Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
 any type of device we emulate.
  
 Break the 29/30/31 virtio-blk limit ... please!

 
 It was broken ages ago:
 
 anth...@howler:~$ wc -l /proc/partitions; tail /proc/partitions
 422 /proc/partitions
   251 1618  1 vdcx2
   251 1621 489951 vdcx5
   251 1632   10485760 vdcy
   251 16339992398 vdcy1
   251 1634  1 vdcy2
   251 1637 489951 vdcy5
   251 1648   10485760 vdcz
   251 16499992398 vdcz1
   251 1650  1 vdcz2
   251 1653 489951 vdcz5
 
 This is what makes qdev so useful.
 
 args=
 for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
 for fn in 0 1 2 3 4 5 6 7; do
  args=$args -drive
 file=/home/anthony/images/linux.img,if=none,snapshot=on,id=disk${slot}_${fn}
  args=$args -device
 virtio-blk-pci,addr=${slot}.${fn},drive=disk${slot}_${fn},multifunction=on
 done
 done
 
 x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args}
 -enable-kvm -serial stdio
 
 Regards,
 
 Anthony Liguori
 
 Rich.
 

 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Richard W.M. Jones

On Thu, Oct 14, 2010 at 04:57:36PM -0500, Anthony Liguori wrote:
 On 10/14/2010 04:42 PM, Richard W.M. Jones wrote:
 On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote:
 Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
 any type of device we emulate.
 Break the 29/30/31 virtio-blk limit ... please!
 
 It was broken ages ago:
[...]

Excellent news indeeed.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anthony Liguori


On 10/14/2010 05:00 PM, Anjali Kulkarni wrote:

Can you send me pointers to the qdev documentation? How can I use it? Will
it allow us to scale above the 32 PCI limit?
   


It's all below.  You just have to create a PCI device and mark the 
multifunction flag to on and then assign it a PCI address that includes 
a function number.  Then you can pack 8 virtio PCI devices into a single 
slot.


Regards,

Anthony Liguori


Anjali


On 10/14/10 2:57 PM, Anthony Liguorianth...@codemonkey.ws  wrote:

   

On 10/14/2010 04:42 PM, Richard W.M. Jones wrote:
 

On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote:

   

Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
any type of device we emulate.

 

Break the 29/30/31 virtio-blk limit ... please!

   

It was broken ages ago:

anth...@howler:~$ wc -l /proc/partitions; tail /proc/partitions
422 /proc/partitions
   251 1618  1 vdcx2
   251 1621 489951 vdcx5
   251 1632   10485760 vdcy
   251 16339992398 vdcy1
   251 1634  1 vdcy2
   251 1637 489951 vdcy5
   251 1648   10485760 vdcz
   251 16499992398 vdcz1
   251 1650  1 vdcz2
   251 1653 489951 vdcz5

This is what makes qdev so useful.

args=
for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
for fn in 0 1 2 3 4 5 6 7; do
  args=$args -drive
file=/home/anthony/images/linux.img,if=none,snapshot=on,id=disk${slot}_${fn}
  args=$args -device
virtio-blk-pci,addr=${slot}.${fn},drive=disk${slot}_${fn},multifunction=on
done
done

x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args}
-enable-kvm -serial stdio

Regards,

Anthony Liguori

 

Rich.


   

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-14 Thread Anjali Kulkarni

Thanks. Does this work for e1000 as well?
Also, does it support pci hotplug?

Anjali

On 10/14/10 3:09 PM, Anthony Liguori anth...@codemonkey.ws wrote:

 On 10/14/2010 05:00 PM, Anjali Kulkarni wrote:
 Can you send me pointers to the qdev documentation? How can I use it? Will
 it allow us to scale above the 32 PCI limit?

 
 It's all below.  You just have to create a PCI device and mark the
 multifunction flag to on and then assign it a PCI address that includes
 a function number.  Then you can pack 8 virtio PCI devices into a single
 slot.
 
 Regards,
 
 Anthony Liguori
 
 Anjali
 
 
 On 10/14/10 2:57 PM, Anthony Liguorianth...@codemonkey.ws  wrote:
 

 On 10/14/2010 04:42 PM, Richard W.M. Jones wrote:
  
 On Thu, Oct 14, 2010 at 01:10:47PM +0100, Daniel P. Berrange wrote:
 

 Or a PCI bridge to wire up more PCI buses, so we raise the max limit for
 any type of device we emulate.
 
  
 Break the 29/30/31 virtio-blk limit ... please!
 

 It was broken ages ago:
 
 anth...@howler:~$ wc -l /proc/partitions; tail /proc/partitions
 422 /proc/partitions
251 1618  1 vdcx2
251 1621 489951 vdcx5
251 1632   10485760 vdcy
251 16339992398 vdcy1
251 1634  1 vdcy2
251 1637 489951 vdcy5
251 1648   10485760 vdcz
251 16499992398 vdcz1
251 1650  1 vdcz2
251 1653 489951 vdcz5
 
 This is what makes qdev so useful.
 
 args=
 for slot in 5 6 7 8 9 10 11 12 13 14 15 16 17; do
 for fn in 0 1 2 3 4 5 6 7; do
   args=$args -drive
 
file=/home/anthony/images/linux.img,if=none,snapshot=on,id=disk${slot}_${fn}

   args=$args -device
 virtio-blk-pci,addr=${slot}.${fn},drive=disk${slot}_${fn},multifunction=on
 done
 done
 
 x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img ${args}
 -enable-kvm -serial stdio
 
 Regards,
 
 Anthony Liguori
 
  
 Rich.
 
 

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
  

 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 112 matches

Mail list logo