[PATCH 3/6] kvm tools: Rework stdio/stdout handling to support redirection

2013-02-06 Thread Michael Ellerman
Currently if you redirect the output from lkvm run to a file then
term_init() will fail, because it can't call the terminal ioctls.

So check if stdin and stdout are ttys, if either is not then skip the
rest of the terminal setup. Redirecting one but not the other is a
little odd, but does work.

Note that we skip registering the cleanup routines, so we don't need to
modify them.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 tools/kvm/term.c |   15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/term.c b/tools/kvm/term.c
index 4413450..fa85e4a 100644
--- a/tools/kvm/term.c
+++ b/tools/kvm/term.c
@@ -140,6 +140,15 @@ int term_init(struct kvm *kvm)
struct termios term;
int i, r;
 
+   for (i = 0; i  4; i++)
+   if (term_fds[i][TERM_FD_IN] == 0) {
+   term_fds[i][TERM_FD_IN] = STDIN_FILENO;
+   term_fds[i][TERM_FD_OUT] = STDOUT_FILENO;
+   }
+
+   if (!isatty(STDIN_FILENO) || !isatty(STDOUT_FILENO))
+   return 0;
+
r = tcgetattr(STDIN_FILENO, orig_term);
if (r  0) {
pr_warning(unable to save initial standard input settings);
@@ -151,12 +160,6 @@ int term_init(struct kvm *kvm)
term.c_lflag = ~(ICANON | ECHO | ISIG);
tcsetattr(STDIN_FILENO, TCSANOW, term);
 
-   for (i = 0; i  4; i++)
-   if (term_fds[i][TERM_FD_IN] == 0) {
-   term_fds[i][TERM_FD_IN] = STDIN_FILENO;
-   term_fds[i][TERM_FD_OUT] = STDOUT_FILENO;
-   }
-
signal(SIGTERM, term_sig_cleanup);
atexit(term_cleanup);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] kvm tools: More error handling in the ipc code

2013-02-06 Thread Michael Ellerman
Add perror() calls to a couple of exit paths, to ease debugging.

There are also two places where we print Failed starting IPC thread,
but one is really an epoll failure, so make that obvious.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 tools/kvm/kvm-ipc.c |   17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/tools/kvm/kvm-ipc.c b/tools/kvm/kvm-ipc.c
index bdcc0d1..7897519 100644
--- a/tools/kvm/kvm-ipc.c
+++ b/tools/kvm/kvm-ipc.c
@@ -49,18 +49,25 @@ static int kvm__create_socket(struct kvm *kvm)
}
 
s = socket(AF_UNIX, SOCK_STREAM, 0);
-   if (s  0)
+   if (s  0) {
+   perror(socket);
return s;
+   }
+
local.sun_family = AF_UNIX;
strlcpy(local.sun_path, full_name, sizeof(local.sun_path));
len = strlen(local.sun_path) + sizeof(local.sun_family);
r = bind(s, (struct sockaddr *)local, len);
-   if (r  0)
+   if (r  0) {
+   perror(bind);
goto fail;
+   }
 
r = listen(s, 5);
-   if (r  0)
+   if (r  0) {
+   perror(listen);
goto fail;
+   }
 
return s;
 
@@ -430,6 +437,7 @@ int kvm_ipc__init(struct kvm *kvm)
 
epoll_fd = epoll_create(KVM_IPC_MAX_MSGS);
if (epoll_fd  0) {
+   perror(epoll_create);
ret = epoll_fd;
goto err;
}
@@ -437,13 +445,14 @@ int kvm_ipc__init(struct kvm *kvm)
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = sock;
if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, sock, ev)  0) {
-   pr_err(Failed starting IPC thread);
+   pr_err(Failed adding socket to epoll);
ret = -EFAULT;
goto err_epoll;
}
 
stop_fd = eventfd(0, 0);
if (stop_fd  0) {
+   perror(eventfd);
ret = stop_fd;
goto err_epoll;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] kvm tools: powerpc: Fix buglet in xics_init() handling of nrcpus

2013-02-06 Thread Michael Ellerman
In xics_init() we set the maximum server to kvm-nrcpus, and then set
the nr_servers using maximum server + 1.

That is off by one, in the harmless direction.

Simplify it to just set nr_servers = kvm-nrcpus.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 tools/kvm/powerpc/xics.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/kvm/powerpc/xics.c b/tools/kvm/powerpc/xics.c
index d4b5caa..cf64a08 100644
--- a/tools/kvm/powerpc/xics.c
+++ b/tools/kvm/powerpc/xics.c
@@ -445,16 +445,13 @@ static void rtas_int_on(struct kvm_cpu *vcpu, uint32_t 
token,
 
 static int xics_init(struct kvm *kvm)
 {
-   int max_server_num;
unsigned int i;
struct icp_state *icp;
struct ics_state *ics;
int j;
 
-   max_server_num = kvm-nrcpus;
-
icp = malloc(sizeof(*icp));
-   icp-nr_servers = max_server_num + 1;
+   icp-nr_servers = kvm-nrcpus;
icp-ss = malloc(icp-nr_servers * sizeof(struct icp_server_state));
 
for (i = 0; i  icp-nr_servers; i++) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] kvm tools: Return error status in lkvm list

2013-02-06 Thread Michael Ellerman
Currently list always returns 0, even if there was an error. Instead
have it accumulate any errors and return that.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 tools/kvm/builtin-list.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-list.c b/tools/kvm/builtin-list.c
index 9299f17..c35be93 100644
--- a/tools/kvm/builtin-list.c
+++ b/tools/kvm/builtin-list.c
@@ -123,7 +123,7 @@ static void parse_setup_options(int argc, const char **argv)
 
 int kvm_cmd_list(int argc, const char **argv, const char *prefix)
 {
-   int r;
+   int status, r;
 
parse_setup_options(argc, argv);
 
@@ -133,17 +133,23 @@ int kvm_cmd_list(int argc, const char **argv, const char 
*prefix)
printf(%6s %-20s %s\n, PID, NAME, STATE);
printf(\n);
 
+   status = 0;
+
if (run) {
r = kvm_list_running_instances();
if (r  0)
perror(Error listing instances);
+
+   status |= r;
}
 
if (rootfs) {
r = kvm_list_rootfs();
if (r  0)
perror(Error listing rootfs);
+
+   status |= r;
}
 
-   return 0;
+   return status;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] kvm tools: powerpc: Only emit TB freq if it's non-zero

2013-02-06 Thread Michael Ellerman
The kernel can handle a missing timebase-frequency property much better
than one that claims zero.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 tools/kvm/powerpc/kvm.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index dc9f89d..b4b9f82 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -389,7 +389,9 @@ static int setup_fdt(struct kvm *kvm)
_FDT(fdt_property_cell(fdt, dcache-block-size, 
cpu_info-d_bsize));
_FDT(fdt_property_cell(fdt, icache-block-size, 
cpu_info-i_bsize));
 
-   _FDT(fdt_property_cell(fdt, timebase-frequency, 
cpu_info-tb_freq));
+   if (cpu_info-tb_freq)
+   _FDT(fdt_property_cell(fdt, timebase-frequency, 
cpu_info-tb_freq));
+
/* Lies, but safeish lies! */
_FDT(fdt_property_cell(fdt, clock-frequency, 0xddbab200));
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] kvm tools: powerpc: Add cpu info entry for POWER8

2013-02-06 Thread Michael Ellerman
We should hard-code less of this stuff, but for now this works.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 tools/kvm/powerpc/cpu_info.c |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/tools/kvm/powerpc/cpu_info.c b/tools/kvm/powerpc/cpu_info.c
index 11ca14e..a9dfe39 100644
--- a/tools/kvm/powerpc/cpu_info.c
+++ b/tools/kvm/powerpc/cpu_info.c
@@ -35,6 +35,20 @@ static struct cpu_info cpu_power7_info = {
},
 };
 
+/* POWER8 */
+
+static struct cpu_info cpu_power8_info = {
+   .name = POWER8,
+   .tb_freq = 51200,
+   .d_bsize = 128,
+   .i_bsize = 128,
+   .flags = CPUINFO_FLAG_DFP | CPUINFO_FLAG_VSX | CPUINFO_FLAG_VMX,
+   .mmu_info = {
+   .flags = KVM_PPC_PAGE_SIZES_REAL | KVM_PPC_1T_SEGMENTS,
+   .slb_size = 32,
+   },
+};
+
 /* PPC970/G5 */
 
 static struct cpu_info cpu_970_info = {
@@ -52,6 +66,7 @@ static struct pvr_info host_pvr_info[] = {
{ 0x, 0x0f03, cpu_power7_info },
{ 0x, 0x003f, cpu_power7_info },
{ 0x, 0x004a, cpu_power7_info },
+   { 0x, 0x004b, cpu_power8_info },
{ 0x, 0x0039, cpu_970_info },
{ 0x, 0x003c, cpu_970_info },
 { 0x, 0x0044, cpu_970_info },
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcm_vhost: Multi-queue support

2013-02-06 Thread Nicholas A. Bellinger
On Wed, 2013-02-06 at 15:09 +0800, Asias He wrote:
 On 02/06/2013 02:45 PM, Nicholas A. Bellinger wrote:
  On Wed, 2013-02-06 at 13:20 +0800, Asias He wrote:
  This adds virtio-scsi multi-queue support to tcm_vhost.
 
  Guest side virtio-scsi multi-queue support can be found here:
 
 https://lkml.org/lkml/2012/12/18/166
 
  Some initial perf numbers:
  1 queue,  4 targets, 1 lun per target
  4K request size, 50% randread + 50% randwrite: 127K/127k IOPS
 
  4 queues, 4 targets, 1 lun per target
  4K request size, 50% randread + 50% randwrite: 181K/181k IOPS
 
  
  Nice single LUN small block random I/O improvement here with 4x vqueues.
  
  Curious to see how virtio-scsi small block performance looks with
  SCSI-core to multi-LUN tcm_vhost endpoints as well..  8-)
 
 Do you mean something like this?
 
 1 queue,  2 targets, 2 lun per target
 4 queue,  2 targets, 2 lun per target
 
  Btw, this does not apply atop current target-pending.git/for-next with
  your other pending vhost patch series, and AFAICT this patch is supposed
  to apply on top of your last PATCH-v3, no..?
 
 Ah, this applies on top of mst's 'tcm_vhost: fix pr_err on early kick
 patch.' plus my last v3 of 'tcm_vhost: Multi-target support'.
 

In that case, applying this patch + PATCH-v3 to auto-next for testing
for the moment, and will respin for-next against upstream w/ MST's patch
shortly.

Also, please include a proper changelog for this second patch.  :)

Thank you!

--nab



  --nab
  
  Signed-off-by: Asias He as...@redhat.com
  ---
   drivers/vhost/tcm_vhost.c | 46 
  +-
   drivers/vhost/tcm_vhost.h |  2 ++
   2 files changed, 31 insertions(+), 17 deletions(-)
 
  diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
  index 81ecda5..9951297 100644
  --- a/drivers/vhost/tcm_vhost.c
  +++ b/drivers/vhost/tcm_vhost.c
  @@ -48,6 +48,7 @@
   #include linux/virtio_net.h /* TODO vhost.h currently depends on this */
   #include linux/virtio_scsi.h
   #include linux/llist.h
  +#include linux/bitmap.h
   
   #include vhost.c
   #include vhost.h
  @@ -59,7 +60,8 @@ enum {
 VHOST_SCSI_VQ_IO = 2,
   };
   
  -#define VHOST_SCSI_MAX_TARGET 256
  +#define VHOST_SCSI_MAX_TARGET 256
  +#define VHOST_SCSI_MAX_VQ 128
   
   struct vhost_scsi {
 /* Protected by vhost_scsi-dev.mutex */
  @@ -68,7 +70,7 @@ struct vhost_scsi {
 bool vs_endpoint;
   
 struct vhost_dev dev;
  -  struct vhost_virtqueue vqs[3];
  +  struct vhost_virtqueue vqs[VHOST_SCSI_MAX_VQ];
   
 struct vhost_work vs_completion_work; /* cmd completion work item */
 struct llist_head vs_completion_list; /* cmd completion queue */
  @@ -366,12 +368,14 @@ static void vhost_scsi_complete_cmd_work(struct 
  vhost_work *work)
   {
 struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
 vs_completion_work);
  +  DECLARE_BITMAP(signal, VHOST_SCSI_MAX_VQ);
 struct virtio_scsi_cmd_resp v_rsp;
 struct tcm_vhost_cmd *tv_cmd;
 struct llist_node *llnode;
 struct se_cmd *se_cmd;
  -  int ret;
  +  int ret, vq;
   
  +  bitmap_zero(signal, VHOST_SCSI_MAX_VQ);
 llnode = llist_del_all(vs-vs_completion_list);
 while (llnode) {
 tv_cmd = llist_entry(llnode, struct tcm_vhost_cmd,
  @@ -390,15 +394,20 @@ static void vhost_scsi_complete_cmd_work(struct 
  vhost_work *work)
 memcpy(v_rsp.sense, tv_cmd-tvc_sense_buf,
v_rsp.sense_len);
 ret = copy_to_user(tv_cmd-tvc_resp, v_rsp, sizeof(v_rsp));
  -  if (likely(ret == 0))
  -  vhost_add_used(vs-vqs[2], tv_cmd-tvc_vq_desc, 0);
  -  else
  +  if (likely(ret == 0)) {
  +  vhost_add_used(tv_cmd-tvc_vq, tv_cmd-tvc_vq_desc, 0);
  +  vq = tv_cmd-tvc_vq - vs-vqs;
  +  __set_bit(vq, signal);
  +  } else
 pr_err(Faulted on virtio_scsi_cmd_resp\n);
   
 vhost_scsi_free_cmd(tv_cmd);
 }
   
  -  vhost_signal(vs-dev, vs-vqs[2]);
  +  vq = -1;
  +  while ((vq = find_next_bit(signal, VHOST_SCSI_MAX_VQ, vq + 1))
  +   VHOST_SCSI_MAX_VQ)
  +  vhost_signal(vs-dev, vs-vqs[vq]);
   }
   
   static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd(
  @@ -561,9 +570,9 @@ static void tcm_vhost_submission_work(struct 
  work_struct *work)
 }
   }
   
  -static void vhost_scsi_handle_vq(struct vhost_scsi *vs)
  +static void vhost_scsi_handle_vq(struct vhost_scsi *vs,
  +  struct vhost_virtqueue *vq)
   {
  -  struct vhost_virtqueue *vq = vs-vqs[2];
 struct virtio_scsi_cmd_req v_req;
 struct tcm_vhost_tpg *tv_tpg;
 struct tcm_vhost_cmd *tv_cmd;
  @@ -656,7 +665,7 @@ static void vhost_scsi_handle_vq(struct vhost_scsi *vs)
 ret = __copy_to_user(resp, rsp, sizeof(rsp));
 if (!ret)
 vhost_add_used_and_signal(vs-dev,
  -  

Re: [PATCH] tcm_vhost: Multi-queue support

2013-02-06 Thread Asias He
On 02/06/2013 04:39 PM, Nicholas A. Bellinger wrote:
 On Wed, 2013-02-06 at 15:09 +0800, Asias He wrote:
 On 02/06/2013 02:45 PM, Nicholas A. Bellinger wrote:
 On Wed, 2013-02-06 at 13:20 +0800, Asias He wrote:
 This adds virtio-scsi multi-queue support to tcm_vhost.

 Guest side virtio-scsi multi-queue support can be found here:

https://lkml.org/lkml/2012/12/18/166

 Some initial perf numbers:
 1 queue,  4 targets, 1 lun per target
 4K request size, 50% randread + 50% randwrite: 127K/127k IOPS

 4 queues, 4 targets, 1 lun per target
 4K request size, 50% randread + 50% randwrite: 181K/181k IOPS


 Nice single LUN small block random I/O improvement here with 4x vqueues.

 Curious to see how virtio-scsi small block performance looks with
 SCSI-core to multi-LUN tcm_vhost endpoints as well..  8-)

 Do you mean something like this?

 1 queue,  2 targets, 2 lun per target
 4 queue,  2 targets, 2 lun per target

 Btw, this does not apply atop current target-pending.git/for-next with
 your other pending vhost patch series, and AFAICT this patch is supposed
 to apply on top of your last PATCH-v3, no..?

 Ah, this applies on top of mst's 'tcm_vhost: fix pr_err on early kick
 patch.' plus my last v3 of 'tcm_vhost: Multi-target support'.

 
 In that case, applying this patch + PATCH-v3 to auto-next for testing
 for the moment, and will respin for-next against upstream w/ MST's patch
 shortly.

Okay. Looking forward to more perf numbers.

 Also, please include a proper changelog for this second patch.  :)

Sure.



tcm_vhost: Multi-queue support

This adds virtio-scsi multi-queue support to tcm_vhost. In order to use
multi-queue, guest side multi-queue support is need. It can
be found here:

   https://lkml.org/lkml/2012/12/18/166

Currently, only one thread is created by vhost core code for each
vhost_scsi instance. Even if there are multi-queues, all the handling of
guest kick (vhost_scsi_handle_kick) are processed in one thread. This is
not optimal. Luckily, most of the work is offloaded to the tcm_vhost
workqueue.

Some initial perf numbers:
1 queue,  4 targets, 1 lun per target
4K request size, 50% randread + 50% randwrite: 127K/127k IOPS

4 queues, 4 targets, 1 lun per target
4K request size, 50% randread + 50% randwrite: 181K/181k IOPS

Signed-off-by: Asias He as...@redhat.com



 Thank you!
 
 --nab
 
 
 
 --nab

 Signed-off-by: Asias He as...@redhat.com
 ---
  drivers/vhost/tcm_vhost.c | 46 
 +-
  drivers/vhost/tcm_vhost.h |  2 ++
  2 files changed, 31 insertions(+), 17 deletions(-)

 diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
 index 81ecda5..9951297 100644
 --- a/drivers/vhost/tcm_vhost.c
 +++ b/drivers/vhost/tcm_vhost.c
 @@ -48,6 +48,7 @@
  #include linux/virtio_net.h /* TODO vhost.h currently depends on this */
  #include linux/virtio_scsi.h
  #include linux/llist.h
 +#include linux/bitmap.h
  
  #include vhost.c
  #include vhost.h
 @@ -59,7 +60,8 @@ enum {
VHOST_SCSI_VQ_IO = 2,
  };
  
 -#define VHOST_SCSI_MAX_TARGET 256
 +#define VHOST_SCSI_MAX_TARGET 256
 +#define VHOST_SCSI_MAX_VQ 128
  
  struct vhost_scsi {
/* Protected by vhost_scsi-dev.mutex */
 @@ -68,7 +70,7 @@ struct vhost_scsi {
bool vs_endpoint;
  
struct vhost_dev dev;
 -  struct vhost_virtqueue vqs[3];
 +  struct vhost_virtqueue vqs[VHOST_SCSI_MAX_VQ];
  
struct vhost_work vs_completion_work; /* cmd completion work item */
struct llist_head vs_completion_list; /* cmd completion queue */
 @@ -366,12 +368,14 @@ static void vhost_scsi_complete_cmd_work(struct 
 vhost_work *work)
  {
struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
vs_completion_work);
 +  DECLARE_BITMAP(signal, VHOST_SCSI_MAX_VQ);
struct virtio_scsi_cmd_resp v_rsp;
struct tcm_vhost_cmd *tv_cmd;
struct llist_node *llnode;
struct se_cmd *se_cmd;
 -  int ret;
 +  int ret, vq;
  
 +  bitmap_zero(signal, VHOST_SCSI_MAX_VQ);
llnode = llist_del_all(vs-vs_completion_list);
while (llnode) {
tv_cmd = llist_entry(llnode, struct tcm_vhost_cmd,
 @@ -390,15 +394,20 @@ static void vhost_scsi_complete_cmd_work(struct 
 vhost_work *work)
memcpy(v_rsp.sense, tv_cmd-tvc_sense_buf,
   v_rsp.sense_len);
ret = copy_to_user(tv_cmd-tvc_resp, v_rsp, sizeof(v_rsp));
 -  if (likely(ret == 0))
 -  vhost_add_used(vs-vqs[2], tv_cmd-tvc_vq_desc, 0);
 -  else
 +  if (likely(ret == 0)) {
 +  vhost_add_used(tv_cmd-tvc_vq, tv_cmd-tvc_vq_desc, 0);
 +  vq = tv_cmd-tvc_vq - vs-vqs;
 +  __set_bit(vq, signal);
 +  } else
pr_err(Faulted on virtio_scsi_cmd_resp\n);
  
vhost_scsi_free_cmd(tv_cmd);
}
  
 -  vhost_signal(vs-dev, vs-vqs[2]);
 +  vq = -1;
 +  while ((vq = find_next_bit(signal, VHOST_SCSI_MAX_VQ, vq + 1))
 +   VHOST_SCSI_MAX_VQ)
 + 

Re: [PATCH 1/6] kvm tools: Return error status in lkvm list

2013-02-06 Thread Pekka Enberg
Applied all patches, thanks a lot Michael!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcm_vhost: Multi-queue support

2013-02-06 Thread Paolo Bonzini


- Messaggio originale -
 Da: Asias He as...@redhat.com
 A: Nicholas A. Bellinger n...@linux-iscsi.org
 Cc: Paolo Bonzini pbonz...@redhat.com, Stefan Hajnoczi 
 stefa...@redhat.com, Michael S. Tsirkin
 m...@redhat.com, Rusty Russell ru...@rustcorp.com.au, 
 kvm@vger.kernel.org,
 virtualizat...@lists.linux-foundation.org, target-de...@vger.kernel.org
 Inviato: Mercoledì, 6 febbraio 2013 10:51:34
 Oggetto: Re: [PATCH] tcm_vhost: Multi-queue support
 
 On 02/06/2013 04:39 PM, Nicholas A. Bellinger wrote:
  On Wed, 2013-02-06 at 15:09 +0800, Asias He wrote:
  On 02/06/2013 02:45 PM, Nicholas A. Bellinger wrote:
  On Wed, 2013-02-06 at 13:20 +0800, Asias He wrote:
  This adds virtio-scsi multi-queue support to tcm_vhost.
 
  Guest side virtio-scsi multi-queue support can be found here:
 
 https://lkml.org/lkml/2012/12/18/166
 
  Some initial perf numbers:
  1 queue,  4 targets, 1 lun per target
  4K request size, 50% randread + 50% randwrite: 127K/127k IOPS
 
  4 queues, 4 targets, 1 lun per target
  4K request size, 50% randread + 50% randwrite: 181K/181k IOPS

4 VCPUs I suppose?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/18] KVM/MIPS32: MMU/TLB operations for the Guest.

2013-02-06 Thread Gleb Natapov
On Wed, Nov 21, 2012 at 06:34:05PM -0800, Sanjay Lal wrote:
 - Note that this file is statically linked with the rest of the host kernel 
 (KSEG0). This is because kernel modules are
 loaded into mapped space on MIPS and we want to make sure that we don't get 
 any host kernel TLB faults while
 manipulating TLBs.
 - Virtual Guest TLBs are implemented as 64 entry array regardless of the 
 number of host TLB entries.
 - Shadow TLBs map Guest virtual addresses to Host physical addresses.
 
 - TLB miss handling details:
 Guest KSEG0 TLBMISS (0x4000 – 0x6000): Transparent to the 
 Guest.
 Guest KSEG2/3 (0x6000 – 0x8000)  Guest UM TLBMISS 
 (0x – 0x4000)
 Lookup in Guest/Virtual TLB
 If an entry doesn’t match
 deliver appropriate TLBMISS LD/ST exception to the guest
 If entry does exist in the Guest TLB and is NOT Valid
 Deliver TLB invalid exception to the guest
 If entry does exist in the Guest TLB and is VALID
 Inject the TLB entry into the Shadow TLB
 
 Signed-off-by: Sanjay Lal sanj...@kymasys.com
 ---
  arch/mips/kvm/kvm_tlb.c | 932 
 
  1 file changed, 932 insertions(+)
  create mode 100644 arch/mips/kvm/kvm_tlb.c
 
 diff --git a/arch/mips/kvm/kvm_tlb.c b/arch/mips/kvm/kvm_tlb.c
 new file mode 100644
 index 000..2d24333
 --- /dev/null
 +++ b/arch/mips/kvm/kvm_tlb.c
 @@ -0,0 +1,932 @@
 +/*
 +* This file is subject to the terms and conditions of the GNU General Public
 +* License.  See the file COPYING in the main directory of this archive
 +* for more details.
 +*
 +* KVM/MIPS TLB handling, this file is part of the Linux host kernel so that
 +* TLB handlers run from KSEG0
 +*
 +* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
 +* Authors: Sanjay Lal sanj...@kymasys.com
 +*/
 +
 +#include linux/init.h
 +#include linux/sched.h
 +#include linux/smp.h
 +#include linux/mm.h
 +#include linux/delay.h
 +#include linux/module.h
 +#include linux/kvm_host.h
 +
 +#include asm/cpu.h
 +#include asm/bootinfo.h
 +#include asm/mmu_context.h
 +#include asm/pgtable.h
 +#include asm/cacheflush.h
 +
 +#undef CONFIG_MIPS_MT
 +#include asm/r4kcache.h
 +#define CONFIG_MIPS_MT
 +
 +#define KVM_GUEST_PC_TLB0
 +#define KVM_GUEST_SP_TLB1
 +
 +#define PRIx64 llx
 +
 +/* Use VZ EntryHi.EHINV to invalidate TLB entries */
 +#define UNIQUE_ENTRYHI(idx) (CKSEG0 + ((idx)  (PAGE_SHIFT + 1)))
 +
 +atomic_t kvm_mips_instance;
 +EXPORT_SYMBOL(kvm_mips_instance);
 +
 +/* These function pointers are initialized once the KVM module is loaded */
 +pfn_t(*kvm_mips_gfn_to_pfn) (struct kvm *kvm, gfn_t gfn);
 +EXPORT_SYMBOL(kvm_mips_gfn_to_pfn);
 +
 +void (*kvm_mips_release_pfn_clean) (pfn_t pfn);
 +EXPORT_SYMBOL(kvm_mips_release_pfn_clean);
 +
 +bool(*kvm_mips_is_error_pfn) (pfn_t pfn);
 +EXPORT_SYMBOL(kvm_mips_is_error_pfn);
 +
 +uint32_t kvm_mips_get_kernel_asid(struct kvm_vcpu *vcpu)
 +{
 + return vcpu-arch.guest_kernel_asid[smp_processor_id()]  ASID_MASK;
 +}
 +
 +
 +uint32_t kvm_mips_get_user_asid(struct kvm_vcpu *vcpu)
 +{
 + return vcpu-arch.guest_user_asid[smp_processor_id()]  ASID_MASK;
 +}
 +
 +inline uint32_t kvm_mips_get_commpage_asid (struct kvm_vcpu *vcpu)
 +{
 + return vcpu-kvm-arch.commpage_tlb;
 +}
 +
 +
 +/*
 + * Structure defining an tlb entry data set.
 + */
 +
 +void kvm_mips_dump_host_tlbs(void)
 +{
 + struct kvm_mips_tlb tlb;
 + int i;
 + ulong flags;
 + unsigned long old_entryhi;
 + unsigned long old_pagemask;
 +
 + local_irq_save(flags);
 +
 + old_entryhi = read_c0_entryhi();
 + old_pagemask = read_c0_pagemask();
 +
 + printk(HOST TLBs:\n);
 + printk(ASID: %#lx\n, read_c0_entryhi()  ASID_MASK);
 +
 + for (i = 0; i  current_cpu_data.tlbsize; i++) {
 + write_c0_index(i);
 + mtc0_tlbw_hazard();
 +
 + tlb_read();
 + tlbw_use_hazard();
 +
 + tlb.tlb_hi = read_c0_entryhi();
 + tlb.tlb_lo0 = read_c0_entrylo0();
 + tlb.tlb_lo1 = read_c0_entrylo1();
 + tlb.tlb_mask = read_c0_pagemask();
 +
 + printk(TLB%c%3d Hi 0x%08lx ,
 +(tlb.tlb_lo0 | tlb.tlb_lo1)  MIPS3_PG_V ? ' ' : '*',
 +i, tlb.tlb_hi);
 + printk(Lo0=0x%09 PRIx64  %c%c attr %lx ,
 +(uint64_t) mips3_tlbpfn_to_paddr(tlb.tlb_lo0),
 +(tlb.tlb_lo0  MIPS3_PG_D) ? 'D' : ' ',
 +(tlb.tlb_lo0  MIPS3_PG_G) ? 'G' : ' ',
 +(tlb.tlb_lo0  3)  7);
 + printk(Lo1=0x%09 PRIx64  %c%c attr %lx sz=%lx\n,
 +(uint64_t) mips3_tlbpfn_to_paddr(tlb.tlb_lo1),
 +(tlb.tlb_lo1  MIPS3_PG_D) ? 'D' : ' ',
 +(tlb.tlb_lo1  MIPS3_PG_G) ? 'G' : ' ',
 +(tlb.tlb_lo1  3)  7, tlb.tlb_mask);
 + }
 + 

[PATCH] kvm tools: arm: fix GIC #defines to match latest kvm code

2013-02-06 Thread Will Deacon
During the review process for the KVM ARM patches, the GIC device
registration was subjected to some minor renaming, so update kvm tool
appropriately.

Signed-off-by: Will Deacon will.dea...@arm.com
---
 tools/kvm/arm/gic.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/kvm/arm/gic.c b/tools/kvm/arm/gic.c
index 3f42c3a..8d2ff87 100644
--- a/tools/kvm/arm/gic.c
+++ b/tools/kvm/arm/gic.c
@@ -22,15 +22,15 @@ int gic__alloc_irqnum(void)
 int gic__init_irqchip(struct kvm *kvm)
 {
int err;
-   struct kvm_device_address gic_addr[] = {
+   struct kvm_arm_device_addr gic_addr[] = {
[0] = {
-   .id = (KVM_ARM_DEVICE_VGIC_V2  KVM_DEVICE_ID_SHIFT) |\
-  KVM_VGIC_V2_ADDR_TYPE_DIST,
+   .id = KVM_VGIC_V2_ADDR_TYPE_DIST |
+   (KVM_ARM_DEVICE_VGIC_V2  KVM_ARM_DEVICE_ID_SHIFT),
.addr = ARM_GIC_DIST_BASE,
},
[1] = {
-   .id = (KVM_ARM_DEVICE_VGIC_V2  KVM_DEVICE_ID_SHIFT) |\
-  KVM_VGIC_V2_ADDR_TYPE_CPU,
+   .id = KVM_VGIC_V2_ADDR_TYPE_CPU |
+   (KVM_ARM_DEVICE_VGIC_V2  KVM_ARM_DEVICE_ID_SHIFT),
.addr = ARM_GIC_CPUI_BASE,
}
};
@@ -45,11 +45,11 @@ int gic__init_irqchip(struct kvm *kvm)
if (err)
return err;
 
-   err = ioctl(kvm-vm_fd, KVM_SET_DEVICE_ADDRESS, gic_addr[0]);
+   err = ioctl(kvm-vm_fd, KVM_ARM_SET_DEVICE_ADDR, gic_addr[0]);
if (err)
return err;
 
-   err = ioctl(kvm-vm_fd, KVM_SET_DEVICE_ADDRESS, gic_addr[1]);
+   err = ioctl(kvm-vm_fd, KVM_ARM_SET_DEVICE_ADDR, gic_addr[1]);
return err;
 }
 
-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: arm: fix GIC #defines to match latest kvm code

2013-02-06 Thread Pekka Enberg
On Wed, Feb 6, 2013 at 2:12 PM, Will Deacon will.dea...@arm.com wrote:
 During the review process for the KVM ARM patches, the GIC device
 registration was subjected to some minor renaming, so update kvm tool
 appropriately.

 Signed-off-by: Will Deacon will.dea...@arm.com

Applied, thanks Will!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 02/18] KVM/MIPS32: Arch specific KVM data structures.

2013-02-06 Thread Gleb Natapov
On Wed, Nov 21, 2012 at 06:34:00PM -0800, Sanjay Lal wrote:
 +struct kvm_mips_callbacks {
 + int (*handle_cop_unusable) (struct kvm_vcpu *vcpu);
 + int (*handle_tlb_mod) (struct kvm_vcpu *vcpu);
 + int (*handle_tlb_ld_miss) (struct kvm_vcpu *vcpu);
 + int (*handle_tlb_st_miss) (struct kvm_vcpu *vcpu);
 + int (*handle_addr_err_st) (struct kvm_vcpu *vcpu);
 + int (*handle_addr_err_ld) (struct kvm_vcpu *vcpu);
 + int (*handle_syscall) (struct kvm_vcpu *vcpu);
 + int (*handle_res_inst) (struct kvm_vcpu *vcpu);
 + int (*handle_break) (struct kvm_vcpu *vcpu);
 + int (*vm_init) (struct kvm *kvm);
 + int (*vcpu_init) (struct kvm_vcpu *vcpu);
 + int (*vcpu_setup) (struct kvm_vcpu *vcpu);
 +  gpa_t(*gva_to_gpa) (gva_t gva);
 + void (*queue_timer_int) (struct kvm_vcpu *vcpu);
 + void (*dequeue_timer_int) (struct kvm_vcpu *vcpu);
 + void (*queue_io_int) (struct kvm_vcpu *vcpu,
 +   struct kvm_mips_interrupt *irq);
 + void (*dequeue_io_int) (struct kvm_vcpu *vcpu,
 + struct kvm_mips_interrupt *irq);
 + int (*irq_deliver) (struct kvm_vcpu *vcpu, unsigned int priority,
 + uint32_t cause);
 + int (*irq_clear) (struct kvm_vcpu *vcpu, unsigned int priority,
 +   uint32_t cause);
 + int (*vcpu_ioctl_get_regs) (struct kvm_vcpu *vcpu,
 + struct kvm_regs *regs);
 + int (*vcpu_ioctl_set_regs) (struct kvm_vcpu *vcpu,
 + struct kvm_regs *regs);
 +};
You haven't addressed Avi's comment about dropping the interaction and
adding it later, when other HW is supported and the best way to do the split
is known.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/18] KVM/MIPS32: COP0 accesses profiling.

2013-02-06 Thread Gleb Natapov
On Wed, Nov 21, 2012 at 06:34:07PM -0800, Sanjay Lal wrote:
 
 Signed-off-by: Sanjay Lal sanj...@kymasys.com
 ---
  arch/mips/kvm/kvm_mips_stats.c | 81 
 ++
  1 file changed, 81 insertions(+)
  create mode 100644 arch/mips/kvm/kvm_mips_stats.c
 
 diff --git a/arch/mips/kvm/kvm_mips_stats.c b/arch/mips/kvm/kvm_mips_stats.c
 new file mode 100644
 index 000..e442a26
 --- /dev/null
 +++ b/arch/mips/kvm/kvm_mips_stats.c
 @@ -0,0 +1,81 @@
 +/*
 +* This file is subject to the terms and conditions of the GNU General Public
 +* License.  See the file COPYING in the main directory of this archive
 +* for more details.
 +*
 +* KVM/MIPS: COP0 access histogram
 +*
 +* Copyright (C) 2012  MIPS Technologies, Inc.  All rights reserved.
 +* Authors: Sanjay Lal sanj...@kymasys.com
 +*/
 +
 +#include linux/kvm_host.h
 +
 +char *kvm_mips_exit_types_str[MAX_KVM_MIPS_EXIT_TYPES] = {
 + WAIT,
 + CACHE,
 + Signal,
 + Interrupt,
 + COP0/1 Unusable,
 + TLB Mod,
 + TLB Miss (LD),
 + TLB Miss (ST),
 + Address Err (ST),
 + Address Error (LD),
 + System Call,
 + Reserved Inst,
 + Break Inst,
 + D-Cache Flushes,
 +};
 +
 +char *kvm_cop0_str[N_MIPS_COPROC_REGS] = {
 + Index,
 + Random,
 + EntryLo0,
 + EntryLo1,
 + Context,
 + PG Mask,
 + Wired,
 + HWREna,
 + BadVAddr,
 + Count,
 + EntryHI,
 + Compare,
 + Status,
 + Cause,
 + EXC PC,
 + PRID,
 + Config,
 + LLAddr,
 + Watch Lo,
 + Watch Hi,
 + X Context,
 + Reserved,
 + Impl Dep,
 + Debug,
 + DEPC,
 + PerfCnt,
 + ErrCtl,
 + CacheErr,
 + TagLo,
 + TagHi,
 + ErrorEPC,
 + DESAVE
 +};
 +
 +int kvm_mips_dump_stats(struct kvm_vcpu *vcpu)
 +{
 + int i, j __unused;
 +#ifdef CONFIG_KVM_MIPS_DEBUG_COP0_COUNTERS
 + printk(\nKVM VCPU[%d] COP0 Access Profile:\n, vcpu-vcpu_id);
 + for (i = 0; i  N_MIPS_COPROC_REGS; i++) {
 + for (j = 0; j  N_MIPS_COPROC_SEL; j++) {
 + if (vcpu-arch.cop0-stat[i][j])
 + printk(%s[%d]: %lu\n, kvm_cop0_str[i], j,
 +vcpu-arch.cop0-stat[i][j]);
 + }
 + }
 +#endif
 +
 + return 0;
 +}
You need to use ftrace event for that. Much more flexible with perf
integration and no need to recompile to enabled/disable.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 11/18] KVM/MIPS32: Routines to handle specific traps/exceptions while executing the guest.

2013-02-06 Thread Gleb Natapov
On Wed, Nov 21, 2012 at 06:34:09PM -0800, Sanjay Lal wrote:
 +static gpa_t kvm_trap_emul_gva_to_gpa_cb(gva_t gva)
 +{
 + gpa_t gpa;
 + uint32_t kseg = KSEGX(gva);
 +
 + if ((kseg == CKSEG0) || (kseg == CKSEG1))
You seems to be using KVM_GUEST_KSEGX variants on gva in all other
places. Why not here?

 + gpa = CPHYSADDR(gva);
 + else {
 + printk(%s: cannot find GPA for GVA: %#lx\n, __func__, gva);
 + kvm_mips_dump_host_tlbs();
 + gpa = KVM_INVALID_ADDR;
 + }
 +
 +#ifdef DEBUG
 + kvm_debug(%s: gva %#lx, gpa: %#llx\n, __func__, gva, gpa);
 +#endif
 +
 + return gpa;
 +}
 +

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Add a timer to allow the separation of consigned from steal time.

2013-02-06 Thread Glauber Costa
On 02/06/2013 01:49 AM, Michael Wolf wrote:
 Add a helper routine to scheduler/core.c to allow the kvm module
 to retrieve the cpu hardlimit settings.  The values will be used
 to set up a timer that is used to separate the consigned from the
 steal time.

Sorry: What is the business of a timer in here?
Whenever we read steal time, we know how much time has passed and with
that information we can know the entitlement for the period. This breaks
if we suspend, but we know that we suspended, so this is not a problem.

Everything bigger the entitlement is steal time.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] virtio-scsi: reset virtqueue affinity when doing cpu hotplug

2013-02-06 Thread Paolo Bonzini
Il 16/01/2013 04:55, Wanlong Gao ha scritto:
  Add hot cpu notifier to reset the request virtqueue affinity
  when doing cpu hotplug.
  
  You need to be careful to get_online_cpus() and put_online_cpus() here,
  so CPUs can't go up and down in the middle of operations.
  
  In particular, get_online_cpus()/put_online_cpus() around calls to
  virtscsi_set_affinity() (except within notifiers).
 Yes, I'll take care of this, thank you.
 

I squashed patch 1 (plus changes to get/put_online_cpus) in my
multiqueue series, and applied this one as a separate patch.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Add a timer to allow the separation of consigned from steal time.

2013-02-06 Thread Michael Wolf

On 02/06/2013 08:36 AM, Glauber Costa wrote:

On 02/06/2013 01:49 AM, Michael Wolf wrote:

Add a helper routine to scheduler/core.c to allow the kvm module
to retrieve the cpu hardlimit settings.  The values will be used
to set up a timer that is used to separate the consigned from the
steal time.

Sorry: What is the business of a timer in here?
Whenever we read steal time, we know how much time has passed and with
that information we can know the entitlement for the period. This breaks
if we suspend, but we know that we suspended, so this is not a problem.
I may be missing something, but how do we know how much time has 
passed?  That is why
I had the timer in there.  I will go look again at the code but I 
thought the data was collected
as ticks and passed at random times.  The ticks are also accumulating so 
we are looking at the

difference in the count between reads.



Everything bigger the entitlement is steal time.
I agree provided I know the amount of total time that the steal time was 
accumulated.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: DMAR faults from unrelated device when vfio is used

2013-02-06 Thread Richard Weinberger
Hi,

Am Tue, 05 Feb 2013 13:36:53 -0700
schrieb Alex Williamson alex.william...@redhat.com:
  Ugh, the infamous and useless error 10.  It could be anything.
  I've got a system with onboard usb3, let me see what windows does
  with it here first.  Thanks,
 
 Well, I've got an Etron USB3 HBA and (un)fortunately it works just
 fine with a Win7 guest.  There's really nothing special about USB
 controllers from a PCI device assignment perspective.  Have you tried
 the latest upstream qemu bits?  Thanks,

USB3 does also not work within a Linux guest.
xhci in debug mode gives a bit more infos.

[1.157888] xhci_hcd :00:07.0: xHCI Host Controller
[1.157899] xhci_hcd :00:07.0: new USB bus registered, assigned bus 
number 4
[1.157948] xhci_hcd :00:07.0: // Halt the HC
[1.157957] xhci_hcd :00:07.0: Resetting HCD
[1.157962] xhci_hcd :00:07.0: // Reset the HC
[1.158111] usb 3-1: new full-speed USB device number 2 using uhci_hcd
[1.158125] xhci_hcd :00:07.0: Wait for controller to be ready for 
doorbell rings
[1.158130] xhci_hcd :00:07.0: Reset complete
[1.158133] xhci_hcd :00:07.0: Enabling 64-bit DMA addresses.
[1.158135] xhci_hcd :00:07.0: Calling HCD init
[1.158136] xhci_hcd :00:07.0: xhci_init
[1.158137] xhci_hcd :00:07.0: xHCI doesn't need link TRB QUIRK
[1.158640] xhci_hcd :00:07.0: Finished xhci_init
[1.158642] xhci_hcd :00:07.0: Called HCD init
[1.158698] xhci_hcd :00:07.0: irq 11, io mem 0xfebf4000
[1.158699] xhci_hcd :00:07.0: xhci_run
[1.159578] xhci_hcd :00:07.0: irq 40 for MSI/MSI-X
[1.159697] xhci_hcd :00:07.0: irq 41 for MSI/MSI-X
[1.159720] xhci_hcd :00:07.0: irq 42 for MSI/MSI-X
[1.159736] xhci_hcd :00:07.0: irq 43 for MSI/MSI-X
[1.159752] xhci_hcd :00:07.0: irq 44 for MSI/MSI-X
[1.179682] xhci_hcd :00:07.0: Setting event ring polling timer
[1.179686] xhci_hcd :00:07.0: Command ring memory map follows:
[1.179693] xhci_hcd :00:07.0: ERST memory map follows:
[1.179695] xhci_hcd :00:07.0: Event ring:
[1.179702] xhci_hcd :00:07.0: ERST deq = 64'h36820400
[1.179703] xhci_hcd :00:07.0: // Set the interrupt modulation register
[1.179710] xhci_hcd :00:07.0: // Enable interrupts, cmd = 0x4.
[1.179715] xhci_hcd :00:07.0: // Enabling event ring interrupter 
c9e68620 by writing 0x2 to irq_pending
[1.179737] xhci_hcd :00:07.0: Finished xhci_run for USB2 roothub
[1.179752] usb usb4: New USB device found, idVendor=1d6b, idProduct=0002
[1.179753] usb usb4: New USB device strings: Mfr=3, Product=2, 
SerialNumber=1
[1.179755] usb usb4: Product: xHCI Host Controller
[1.179756] usb usb4: Manufacturer: Linux 3.8.0-rc6-2.10-desktop xhci_hcd
[1.179757] usb usb4: SerialNumber: :00:07.0
[1.179967] xHCI xhci_add_endpoint called for root hub
[1.179971] xHCI xhci_check_bandwidth called for root hub
[1.180081] hub 4-0:1.0: USB hub found
[1.180094] hub 4-0:1.0: 2 ports detected
[1.180200] xhci_hcd :00:07.0: xHCI Host Controller
[1.180206] xhci_hcd :00:07.0: new USB bus registered, assigned bus 
number 5
[1.180214] xhci_hcd :00:07.0: Enabling 64-bit DMA addresses.
[1.180219] xhci_hcd :00:07.0: // Turn on HC, cmd = 0x5.
[1.245201] xhci_hcd :00:07.0: Host took too long to start, waited 16000 
microseconds.

This one looks interesting.

[1.245414] xhci_hcd :00:07.0: // Halt the HC
[1.245424] xhci_hcd :00:07.0: startup error -19
[1.245551] xhci_hcd :00:07.0: USB bus 5 deregistered
[1.245556] xhci_hcd :00:07.0: remove, state 1
[1.245560] usb usb4: USB disconnect, device number 1
[1.245608] xHCI xhci_drop_endpoint called for root hub
[1.245609] xHCI xhci_check_bandwidth called for root hub
[1.245684] xhci_hcd :00:07.0: // Halt the HC
[1.245695] xhci_hcd :00:07.0: // Reset the HC
[1.245741] xhci_hcd :00:07.0: Wait for controller to be ready for 
doorbell rings
[1.256413] xhci_hcd :00:07.0: // Disabling event ring interrupts
[1.256427] xhci_hcd :00:07.0: cleaning up memory
[1.256440] xhci_hcd :00:07.0: xhci_stop completed - status = 1
[1.256446] xhci_hcd :00:07.0: USB bus 4 deregistered
[1.258194] ata_piix :00:01.1: version 2.13

Within the guest lscpi -vv gives:

00:07.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 
04) (prog-if 30 [XHCI])
Subsystem: Intel Corporation Device 2008
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort+ SERR- PERR- INTx-
Interrupt: pin A routed to IRQ 11
Region 0: Memory at febf4000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [50] Power Management version 3
Flags: 

Re: DMAR faults from unrelated device when vfio is used

2013-02-06 Thread Alex Williamson
On Wed, 2013-02-06 at 19:09 +0100, Richard Weinberger wrote:
 Hi,
 
 Am Tue, 05 Feb 2013 13:36:53 -0700
 schrieb Alex Williamson alex.william...@redhat.com:
   Ugh, the infamous and useless error 10.  It could be anything.
   I've got a system with onboard usb3, let me see what windows does
   with it here first.  Thanks,
  
  Well, I've got an Etron USB3 HBA and (un)fortunately it works just
  fine with a Win7 guest.  There's really nothing special about USB
  controllers from a PCI device assignment perspective.  Have you tried
  the latest upstream qemu bits?  Thanks,
 
 USB3 does also not work within a Linux guest.
 xhci in debug mode gives a bit more infos.

Does the card work with pci-assign or are both broken?
 
 [1.157888] xhci_hcd :00:07.0: xHCI Host Controller
 [1.157899] xhci_hcd :00:07.0: new USB bus registered, assigned bus 
 number 4
 [1.157948] xhci_hcd :00:07.0: // Halt the HC
 [1.157957] xhci_hcd :00:07.0: Resetting HCD
 [1.157962] xhci_hcd :00:07.0: // Reset the HC
 [1.158111] usb 3-1: new full-speed USB device number 2 using uhci_hcd
 [1.158125] xhci_hcd :00:07.0: Wait for controller to be ready for 
 doorbell rings
 [1.158130] xhci_hcd :00:07.0: Reset complete
 [1.158133] xhci_hcd :00:07.0: Enabling 64-bit DMA addresses.
 [1.158135] xhci_hcd :00:07.0: Calling HCD init
 [1.158136] xhci_hcd :00:07.0: xhci_init
 [1.158137] xhci_hcd :00:07.0: xHCI doesn't need link TRB QUIRK
 [1.158640] xhci_hcd :00:07.0: Finished xhci_init
 [1.158642] xhci_hcd :00:07.0: Called HCD init
 [1.158698] xhci_hcd :00:07.0: irq 11, io mem 0xfebf4000
 [1.158699] xhci_hcd :00:07.0: xhci_run
 [1.159578] xhci_hcd :00:07.0: irq 40 for MSI/MSI-X
 [1.159697] xhci_hcd :00:07.0: irq 41 for MSI/MSI-X
 [1.159720] xhci_hcd :00:07.0: irq 42 for MSI/MSI-X
 [1.159736] xhci_hcd :00:07.0: irq 43 for MSI/MSI-X
 [1.159752] xhci_hcd :00:07.0: irq 44 for MSI/MSI-X
 [1.179682] xhci_hcd :00:07.0: Setting event ring polling timer
 [1.179686] xhci_hcd :00:07.0: Command ring memory map follows:
 [1.179693] xhci_hcd :00:07.0: ERST memory map follows:
 [1.179695] xhci_hcd :00:07.0: Event ring:
 [1.179702] xhci_hcd :00:07.0: ERST deq = 64'h36820400
 [1.179703] xhci_hcd :00:07.0: // Set the interrupt modulation register
 [1.179710] xhci_hcd :00:07.0: // Enable interrupts, cmd = 0x4.
 [1.179715] xhci_hcd :00:07.0: // Enabling event ring interrupter 
 c9e68620 by writing 0x2 to irq_pending
 [1.179737] xhci_hcd :00:07.0: Finished xhci_run for USB2 roothub
 [1.179752] usb usb4: New USB device found, idVendor=1d6b, idProduct=0002
 [1.179753] usb usb4: New USB device strings: Mfr=3, Product=2, 
 SerialNumber=1
 [1.179755] usb usb4: Product: xHCI Host Controller
 [1.179756] usb usb4: Manufacturer: Linux 3.8.0-rc6-2.10-desktop xhci_hcd
 [1.179757] usb usb4: SerialNumber: :00:07.0
 [1.179967] xHCI xhci_add_endpoint called for root hub
 [1.179971] xHCI xhci_check_bandwidth called for root hub
 [1.180081] hub 4-0:1.0: USB hub found
 [1.180094] hub 4-0:1.0: 2 ports detected
 [1.180200] xhci_hcd :00:07.0: xHCI Host Controller
 [1.180206] xhci_hcd :00:07.0: new USB bus registered, assigned bus 
 number 5
 [1.180214] xhci_hcd :00:07.0: Enabling 64-bit DMA addresses.
 [1.180219] xhci_hcd :00:07.0: // Turn on HC, cmd = 0x5.
 [1.245201] xhci_hcd :00:07.0: Host took too long to start, waited 
 16000 microseconds.
 
 This one looks interesting.

Yep, the register never got to the state it was looking for.

 [1.245414] xhci_hcd :00:07.0: // Halt the HC
 [1.245424] xhci_hcd :00:07.0: startup error -19
 [1.245551] xhci_hcd :00:07.0: USB bus 5 deregistered
 [1.245556] xhci_hcd :00:07.0: remove, state 1
 [1.245560] usb usb4: USB disconnect, device number 1
 [1.245608] xHCI xhci_drop_endpoint called for root hub
 [1.245609] xHCI xhci_check_bandwidth called for root hub
 [1.245684] xhci_hcd :00:07.0: // Halt the HC
 [1.245695] xhci_hcd :00:07.0: // Reset the HC
 [1.245741] xhci_hcd :00:07.0: Wait for controller to be ready for 
 doorbell rings
 [1.256413] xhci_hcd :00:07.0: // Disabling event ring interrupts
 [1.256427] xhci_hcd :00:07.0: cleaning up memory
 [1.256440] xhci_hcd :00:07.0: xhci_stop completed - status = 1
 [1.256446] xhci_hcd :00:07.0: USB bus 4 deregistered
 [1.258194] ata_piix :00:01.1: version 2.13
 
 Within the guest lscpi -vv gives:
 
 00:07.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller 
 (rev 04) (prog-if 30 [XHCI])
 Subsystem: Intel Corporation Device 2008
 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
 Stepping- SERR+ FastB2B- DisINTx-
 Status: Cap+ 

[PATCH 66/77] vfio: convert to idr_alloc()

2013-02-06 Thread Tejun Heo
Convert to the much saner new idr interface.

Only compile tested.

v2: Restore accidentally dropped index 0 comment as suggested by
Alex.

Signed-off-by: Tejun Heo t...@kernel.org
Acked-by: Alex Williamson alex.william...@redhat.com
Cc: kvm@vger.kernel.org
---
 drivers/vfio/vfio.c | 17 +
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 12c264d..7f61abf 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -139,23 +139,8 @@ EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
  */
 static int vfio_alloc_group_minor(struct vfio_group *group)
 {
-   int ret, minor;
-
-again:
-   if (unlikely(idr_pre_get(vfio.group_idr, GFP_KERNEL) == 0))
-   return -ENOMEM;
-
/* index 0 is used by /dev/vfio/vfio */
-   ret = idr_get_new_above(vfio.group_idr, group, 1, minor);
-   if (ret == -EAGAIN)
-   goto again;
-   if (ret || minor  MINORMASK) {
-   if (minor  MINORMASK)
-   idr_remove(vfio.group_idr, minor);
-   return -ENOSPC;
-   }
-
-   return minor;
+   return idr_alloc(vfio.group_idr, group, 1, MINORMASK + 1, GFP_KERNEL);
 }
 
 static void vfio_free_group_minor(int minor)
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: DMAR faults from unrelated device when vfio is used

2013-02-06 Thread Richard Weinberger
Hi,

Am Wed, 06 Feb 2013 11:47:20 -0700
schrieb Alex Williamson alex.william...@redhat.com: 
 Does the card work with pci-assign or are both broken?

It works with pci-assign. :-\

 
 Possible there's a bug in how we're managing the vector table and pba
 here.  Can you get to the monitor and run 'into mtree' and provide the
 results?  Thanks,

Please see attachment.

Thanks,
//richard(qemu) info mtree
info mtree
memory
-7ffe (prio 0, RW): system
  -dfff (prio 0, RW): alias ram-below-4g @pc.ram 
-dfff
  000a-000b (prio 1, RW): alias smram-region @pci 
000a-000b
  000c-000c3fff (prio 1, R-): alias pam-rom @pc.ram 
000c-000c3fff
  000c4000-000c7fff (prio 1, R-): alias pam-rom @pc.ram 
000c4000-000c7fff
  000c8000-000cbfff (prio 1, R-): alias pam-rom @pc.ram 
000c8000-000cbfff
  000cb000-000cdfff (prio 1000, RW): alias kvmvapic-rom @pc.ram 
000cb000-000cdfff
  000cc000-000c (prio 1, R-): alias pam-rom @pc.ram 
000cc000-000c
  000d-000d3fff (prio 1, RW): alias pam-ram @pc.ram 
000d-000d3fff
  000d4000-000d7fff (prio 1, RW): alias pam-ram @pc.ram 
000d4000-000d7fff
  000d8000-000dbfff (prio 1, RW): alias pam-ram @pc.ram 
000d8000-000dbfff
  000dc000-000d (prio 1, RW): alias pam-ram @pc.ram 
000dc000-000d
  000e-000e3fff (prio 1, RW): alias pam-ram @pc.ram 
000e-000e3fff
  000e4000-000e7fff (prio 1, RW): alias pam-ram @pc.ram 
000e4000-000e7fff
  000e8000-000ebfff (prio 1, RW): alias pam-ram @pc.ram 
000e8000-000ebfff
  000ec000-000e (prio 1, RW): alias pam-ram @pc.ram 
000ec000-000e
  000f-000f (prio 1, R-): alias pam-rom @pc.ram 
000f-000f
  e000- (prio 0, RW): alias pci-hole @pci 
e000-
  fec0-fec00fff (prio 0, RW): kvm-ioapic
  fed0-fed003ff (prio 0, RW): hpet
  fee0-feef (prio 0, RW): kvm-apic-msi
  0001-00019fff (prio 0, RW): alias ram-above-4g @pc.ram 
e000-00017fff
  0001a000-40019fff (prio 0, RW): alias pci-hole64 @pci 
0001a000-40019fff
I/O
- (prio 0, RW): io
  0020-0021 (prio 0, RW): kvm-pic
  0040-0043 (prio 0, RW): kvm-pit
  0060-0060 (prio 0, RW): i8042-data
  0061-0061 (prio 0, RW): elcr
  0064-0064 (prio 0, RW): i8042-cmd
  0070-0071 (prio 0, RW): rtc
  007e-007f (prio 0, RW): kvmvapic
  0092-0092 (prio 0, RW): port92
  00a0-00a1 (prio 0, RW): kvm-pic
  0170-0177 (prio 0, RW): alias ide @ide 
0170-0177
  01ce-01d0 (prio 0, RW): alias vbe @vbe 
01ce-01d0
  01f0-01f7 (prio 0, RW): alias ide @ide 
01f0-01f7
  0376-0376 (prio 0, RW): alias ide @ide 
0376-0376
  0378-037f (prio 0, RW): alias parallel @parallel 
0378-037f
  03b4-03b5 (prio 0, RW): alias vga @vga 
03b4-03b5
  03ba-03ba (prio 0, RW): alias vga @vga 
03ba-03ba
  03c0-03cf (prio 0, RW): alias vga @vga 
03c0-03cf
  03d4-03d5 (prio 0, RW): alias vga @vga 
03d4-03d5
  03da-03da (prio 0, RW): alias vga @vga 
03da-03da
  03f1-03f5 (prio 0, RW): alias fdc @fdc 
03f1-03f5
  03f6-03f6 (prio 0, RW): alias ide @ide 
03f6-03f6
  03f7-03f7 (prio 0, RW): alias fdc @fdc 
03f7-03f7
  03f8-03ff (prio 0, RW): serial
  04d0-04d0 (prio 0, RW): kvm-elcr
  04d1-04d1 (prio 0, RW): kvm-elcr
  0510-0511 (prio 0, RW): fwcfg
  0cf8-0cfb (prio 0, RW): pci-conf-idx
  0cfc-0cff (prio 0, RW): pci-conf-data
  5658-5658 (prio 0, 

Re: [PATCH 2/4] Expand the steal time msr to also contain the consigned time.

2013-02-06 Thread Rik van Riel

On 02/05/2013 04:49 PM, Michael Wolf wrote:

Expand the steal time msr to also contain the consigned time.

Signed-off-by: Michael Wolf m...@linux.vnet.ibm.com
---
  arch/x86/include/asm/paravirt.h   |4 ++--
  arch/x86/include/asm/paravirt_types.h |2 +-
  arch/x86/kernel/kvm.c |7 ++-
  kernel/sched/core.c   |   10 +-
  kernel/sched/cputime.c|2 +-
  5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 5edd174..9b753ea 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
  extern struct static_key paravirt_steal_enabled;
  extern struct static_key paravirt_steal_rq_enabled;

-static inline u64 paravirt_steal_clock(int cpu)
+static inline void paravirt_steal_clock(int cpu, u64 *steal)
  {
-   return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+   PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
  }


This may be a stupid question, but what happens if a KVM
guest with this change, runs on a kernel that still has
the old steal time interface?

What happens if the host has the new steal time interface,
but the guest uses the old interface?

Will both cases continue to work as expected with your
patch series?

If so, could you document (in the source code) why things
continue to work?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Add the code to send the consigned time from the host to the guest

2013-02-06 Thread Rik van Riel

On 02/05/2013 04:49 PM, Michael Wolf wrote:

Change the paravirt calls that retrieve the steal-time information
from the host.  Add to it getting the consigned value as well as
the steal time.

Signed-off-by: Michael Wolf m...@linux.vnet.ibm.com



diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..55d617f 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -42,9 +42,10 @@

  struct kvm_steal_time {
__u64 steal;
+   __u64 consigned;
__u32 version;
__u32 flags;
-   __u32 pad[12];
+   __u32 pad[10];
  };


The function kvm_register_steal_time passes the address of such
a structure to the host kernel, which then does something with
it.

Could running a guest with the above patch, on top of a host
with the old code, result in the values for version and
flags being written into consigned?

Could that result in confusing the guest kernel to no end,
and generally breaking things?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V5] target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs

2013-02-06 Thread Auld, Will
Marcelo, Hi,

I have been watching for this patch in the upstream but have not seen it yet. 
What version of QEMU should it be in?

Thanks,

Will

 -Original Message-
 From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
 Sent: Friday, November 30, 2012 12:40 PM
 To: Auld, Will
 Cc: qemu-devel; Gleb; Andreas Farber; kvm@vger.kernel.org; Dugger,
 Donald D; Liu, Jinsong; Zhang, Xiantao; a...@redhat.com
 Subject: Re: [PATCH V5] target-i386: Enabling IA32_TSC_ADJUST for QEMU
 KVM guest VMs
 
 On Mon, Nov 26, 2012 at 09:32:18PM -0800, Will Auld wrote:
  CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
 
  Basic design is to emulate the MSR by allowing reads and writes to
 the
  hypervisor vcpu specific locations to store the value of the emulated
 MSRs.
  In this way the IA32_TSC_ADJUST value will be included in all reads
 to
  the TSC MSR whether through rdmsr or rdtsc.
 
  As this is a new MSR that the guest may access and modify its value
  needs to be migrated along with the other MRSs. The changes here are
  specifically for recognizing when IA32_TSC_ADJUST is enabled in CPUID
  and code added for migrating its value.
 
  Signed-off-by: Will Auld will.a...@intel.com
  ---
  Andreas,
 
  Thanks, that helped. I used Stefan's auto-run method this time.
 
  Will
 
   target-i386/cpu.h |  2 ++
   target-i386/kvm.c | 14 ++
   target-i386/machine.c | 21 +
   3 files changed, 37 insertions(+)
 
 Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: DMAR faults from unrelated device when vfio is used

2013-02-06 Thread Alex Williamson
On Wed, 2013-02-06 at 21:25 +0100, Richard Weinberger wrote:
 Hi,
 
 Am Wed, 06 Feb 2013 11:47:20 -0700
 schrieb Alex Williamson alex.william...@redhat.com: 
  Does the card work with pci-assign or are both broken?
 
 It works with pci-assign. :-\

When you tested this, did you detach the group from vfio or use it as
is?  In your previous message I see this:

03:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host 
Controller [1033:0194] (rev ff)

/sys/kernel/iommu_groups/7/devices:
total 0
lrwxrwxrwx 1 root root 0 Feb  4 10:29 :00:1c.0 - 
../../../../devices/pci:00/:00:1c.0
lrwxrwxrwx 1 root root 0 Feb  4 10:29 :00:1c.6 - 
../../../../devices/pci:00/:00:1c.6
lrwxrwxrwx 1 root root 0 Feb  4 10:29 :03:00.0 - 
../../../../devices/pci:00/:00:1c.6/:03:00.0

This seemed like a good card to have in my test cache, so I went and got
one and it works fine for me... but I've been playing with pcieport
because I don't think we're handling them correctly in vfio.

Can you provide lspci -vvv -s 1c.6 while the guest is running?  I'm
going to bet that

Control: I/O+ Mem+ BusMaster+

is not set, which it would have been if pci-assign was tested without
the group bound to vfio.  I think the solution is going to be something
around white-listing pcieport, which you can easily test with a kernel
patch like this:

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 12c264d..48a97fb 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -442,7 +442,7 @@ static struct vfio_device *vfio_group_get_device(struct vfio
  * a device.  It's not always practical to leave a device within a group
  * driverless as it could get re-bound to something unsafe.
  */
-static const char * const vfio_driver_whitelist[] = { pci-stub };
+static const char * const vfio_driver_whitelist[] = { pci-stub, pcieport };
 
 static bool vfio_whitelisted_driver(struct device_driver *drv)
 {

Then you won't need to bind 1c.0 or 1c.6 to vfio-pci and hopefully
things will work.  The other problem you might hit is that the pciehp
service driver may also be bound to these slots and somehow deletes the
pci device and re-adds it when a device reset happens.  This causes all
sorts of badness.  The solution here is to unbind the child device from
pciehp, ie:

echo :00:1c.0:pcie04 | sudo \
tee /sys/bus/pci_express/drivers/pciehp/unbind
echo :00:1c.6:pcie04 | sudo \
tee /sys/bus/pci_express/drivers/pciehp/unbind

Hopefully combined that will make things work, please let me know.
Another option is to move the device to a slot where it isn't grouped
with the root port above it, assuming it's a plugin card.  Also if we
could determine that these root ports support PCI ACS but just don't
report it, we could change the grouping and avoid root ports grouped
with devices.

I'm still trying to formulate how to fix this long term, whether we
should whitelist pcieport and require userspace to do this kind of set
(need a hotplug stub driver?) or if vfio-pci needs to gain some basic
pcieport functionality that can enable the device and bind service
drivers we want (aer) and avoid ones we don't (pciehp).  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting

2013-02-06 Thread Marcelo Tosatti
On Tue, Feb 05, 2013 at 09:32:50AM +0200, Gleb Natapov wrote:
 On Mon, Feb 04, 2013 at 06:47:30PM -0200, Marcelo Tosatti wrote:
  On Mon, Feb 04, 2013 at 05:59:52PM -0200, Marcelo Tosatti wrote:
   On Mon, Feb 04, 2013 at 07:13:01PM +0200, Gleb Natapov wrote:
On Mon, Feb 04, 2013 at 12:43:45PM -0200, Marcelo Tosatti wrote:
   Any example how software relies on such 
   two-interrupts-queued-in-IRR/ISR behaviour?
  Don't know about guests, but KVM relies on it to detect interrupt
  coalescing. So if interrupt is set in IRR but not in PIR interrupt 
  will
  not be reported as coalesced, but it will be coalesced during 
  PIR-IRR
  merge.
 
 Yes, so:
 
 1. IRR=1, ISR=0, PIR=0. Event: set_irq, coalesced=no.
 2. IRR=0, ISR=1, PIR=0. Event: IRR-ISR transfer.
 3. vcpu outside of guest mode.
 4. IRR=1, ISR=1, PIR=0. Event: set_irq, coalesced=no.
 5. vcpu enters guest mode.
 6. IRR=1, ISR=1, PIR=1. Event: set_irq, coalesced=no.
 7. HW transfers PIR into IRR.
 
 set_irq return value at 7 is incorrect, interrupt event was _not_
 queued.
Not sure I understand the flow of events in your description correctly. 
As I
understand it at 4 set_irq() will return incorrect result. Basically
when PIR is set to 1 while IRR has 1 for the vector the value of
set_irq() will be incorrect.
   
   At 4 it has not been coalesced: it has been queued to IRR.
   At 6 it has been coalesced: PIR bit merged into IRR bit.
   
 Yes, that's the case.
 
Frankly I do not see how it can be fixed
without any race with present HW PIR design.
   
   At kvm_accept_apic_interrupt, check IRR before setting PIR bit, if IRR
   already set, don't set PIR.
 Need to check both IRR and PIR. Something like that:
 
 apic_accept_interrupt() {
  if (PIR || IRR)
return coalesced;
  else
set PIR;
 }
 
 This has two problems. Firs is that interrupt that can be delivered will
 be not (IRR is cleared just after it was tested), but it will be reported
 as coalesced, so this is benign race. 

Yes, and the same condition exists today with IRR, its fine.

 Second is that interrupt may be
 reported as delivered, but it will be coalesced (possible only with the self
 IPI with the same vector):
 
 Starting condition: PIR=0, IRR=0 vcpu is in a guest mode
 
 io thread |   vcpu
 accept_apic_interrupt()   |
  PIR and IRR is zero  |
  set PIR  |
  return delivered |
   |  self IPI
   |  set IRR
   | merge PIR to IRR (*)
 
 At (*) interrupt that was reported as delivered is coalesced.

Only vcpu itself should send self-IPI, so its fine.

  Or:
  
  apic_accept_interrupt() {
  
  1. Read ORIG_PIR=PIR, ORIG_IRR=IRR.
  Never set IRR when HWAPIC enabled, even if outside of guest mode.
  2. Set PIR and let HW or SW VM-entry transfer it to IRR.
  3. set_irq return value: (ORIG_PIR or ORIG_IRR set).
  }
  
 This can report interrupt as coalesced, but it will be eventually delivered
 as separate interrupt:
 
 Starting condition: PIR=0, IRR=1 vcpu is in a guest mode
 
  io thread  | vcpu
 | 
 accept_apic_interrupt() |
 ORIG_PIR=0, ORIG_IRR=1  |
 |EOI
 |clear IRR, set ISR
 set PIR |
 return coalesced|
 |clear PIR, set IRR
 |EOI
 |clear IRR, set ISR (*)
 
 At (*) interrupt that was reported as coalesced is delivered.
 
 
 So still no perfect solution. But first one has much less serious
 problems for our practical needs.
 
  Two or more concurrent set_irq can race with each other, though. Can
  either document the race or add a lock.
  
 
 --
   Gleb.

Ok, then:

accept_apic_irq:
1. coalesced = test_and_set_bit(PIR)
2. set KVM_REQ_EVENT bit(*)
3. if (vcpu-in_guest_mode)
4.  if (test_and_set_bit(pir notification bit))
5.  send PIR IPI
6. return coalesced

Other sites:
A: On VM-entry, after disabling interrupts, but before
the last check for -requests, clear pir notification bit 
(unconditionally).

(*) This is _necessary_ also because during VM-exit a PIR IPI interrupt can 
be missed, so the KVM_REQ_EVENT indicates that SW is responsible for
PIR-IRR transfer.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5] target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs

2013-02-06 Thread Marcelo Tosatti
On Wed, Feb 06, 2013 at 10:22:32PM +, Auld, Will wrote:
 Marcelo, Hi,
 
 I have been watching for this patch in the upstream but have not seen it yet. 
 What version of QEMU should it be in?
 
 Thanks,
 
 Will

Will, its in the GIT tree:
https://github.com/qemu/qemu/commit/f28558d3d37ad3bc4e35e8ac93f7bf81a0d5622c

As for the next release:
http://www.mail-archive.com/qemu-devel@nongnu.org/msg153579.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting

2013-02-06 Thread Marcelo Tosatti
On Wed, Feb 06, 2013 at 08:49:23PM -0200, Marcelo Tosatti wrote:
 On Tue, Feb 05, 2013 at 09:32:50AM +0200, Gleb Natapov wrote:
  On Mon, Feb 04, 2013 at 06:47:30PM -0200, Marcelo Tosatti wrote:
   On Mon, Feb 04, 2013 at 05:59:52PM -0200, Marcelo Tosatti wrote:
On Mon, Feb 04, 2013 at 07:13:01PM +0200, Gleb Natapov wrote:
 On Mon, Feb 04, 2013 at 12:43:45PM -0200, Marcelo Tosatti wrote:
Any example how software relies on such 
two-interrupts-queued-in-IRR/ISR behaviour?
   Don't know about guests, but KVM relies on it to detect interrupt
   coalescing. So if interrupt is set in IRR but not in PIR 
   interrupt will
   not be reported as coalesced, but it will be coalesced during 
   PIR-IRR
   merge.
  
  Yes, so:
  
  1. IRR=1, ISR=0, PIR=0. Event: set_irq, coalesced=no.
  2. IRR=0, ISR=1, PIR=0. Event: IRR-ISR transfer.
  3. vcpu outside of guest mode.
  4. IRR=1, ISR=1, PIR=0. Event: set_irq, coalesced=no.
  5. vcpu enters guest mode.
  6. IRR=1, ISR=1, PIR=1. Event: set_irq, coalesced=no.
  7. HW transfers PIR into IRR.
  
  set_irq return value at 7 is incorrect, interrupt event was _not_
  queued.
 Not sure I understand the flow of events in your description 
 correctly. As I
 understand it at 4 set_irq() will return incorrect result. Basically
 when PIR is set to 1 while IRR has 1 for the vector the value of
 set_irq() will be incorrect.

At 4 it has not been coalesced: it has been queued to IRR.
At 6 it has been coalesced: PIR bit merged into IRR bit.

  Yes, that's the case.
  
 Frankly I do not see how it can be fixed
 without any race with present HW PIR design.

At kvm_accept_apic_interrupt, check IRR before setting PIR bit, if IRR
already set, don't set PIR.
  Need to check both IRR and PIR. Something like that:
  
  apic_accept_interrupt() {
   if (PIR || IRR)
 return coalesced;
   else
 set PIR;
  }
  
  This has two problems. Firs is that interrupt that can be delivered will
  be not (IRR is cleared just after it was tested), but it will be reported
  as coalesced, so this is benign race. 
 
 Yes, and the same condition exists today with IRR, its fine.
 
  Second is that interrupt may be
  reported as delivered, but it will be coalesced (possible only with the self
  IPI with the same vector):
  
  Starting condition: PIR=0, IRR=0 vcpu is in a guest mode
  
  io thread |   vcpu
  accept_apic_interrupt()   |
   PIR and IRR is zero  |
   set PIR  |
   return delivered |
|  self IPI
|  set IRR
| merge PIR to IRR (*)
  
  At (*) interrupt that was reported as delivered is coalesced.
 
 Only vcpu itself should send self-IPI, so its fine.
 
   Or:
   
   apic_accept_interrupt() {
   
   1. Read ORIG_PIR=PIR, ORIG_IRR=IRR.
   Never set IRR when HWAPIC enabled, even if outside of guest mode.
   2. Set PIR and let HW or SW VM-entry transfer it to IRR.
   3. set_irq return value: (ORIG_PIR or ORIG_IRR set).
   }
   
  This can report interrupt as coalesced, but it will be eventually delivered
  as separate interrupt:
  
  Starting condition: PIR=0, IRR=1 vcpu is in a guest mode
  
   io thread  | vcpu
  | 
  accept_apic_interrupt() |
  ORIG_PIR=0, ORIG_IRR=1  |
  |EOI
  |clear IRR, set ISR
  set PIR |
  return coalesced|
  |clear PIR, set IRR
  |EOI
  |clear IRR, set ISR (*)
  
  At (*) interrupt that was reported as coalesced is delivered.
  
  
  So still no perfect solution. But first one has much less serious
  problems for our practical needs.
  
   Two or more concurrent set_irq can race with each other, though. Can
   either document the race or add a lock.
   
  
  --
  Gleb.
 
 Ok, then:
 
 accept_apic_irq:
 1. coalesced = test_and_set_bit(PIR)
 2. set KVM_REQ_EVENT bit  (*)
 3. if (vcpu-in_guest_mode)
 4.if (test_and_set_bit(pir notification bit))
 5.send PIR IPI
 6. return coalesced
 
 Other sites:
 A: On VM-entry, after disabling interrupts, but before
 the last check for -requests, clear pir notification bit 
 (unconditionally).
 
 (*) This is _necessary_ also because during VM-exit a PIR IPI interrupt can 
 be missed, so the KVM_REQ_EVENT indicates that SW is responsible for
 PIR-IRR transfer.

Its not a bad idea to have a new KVM_REQ_ bit for PIR processing (just
as the current patches do).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcm_vhost: Multi-queue support

2013-02-06 Thread Asias He
On 02/06/2013 07:59 PM, Paolo Bonzini wrote:
 
 
 - Messaggio originale -
 Da: Asias He as...@redhat.com
 A: Nicholas A. Bellinger n...@linux-iscsi.org
 Cc: Paolo Bonzini pbonz...@redhat.com, Stefan Hajnoczi 
 stefa...@redhat.com, Michael S. Tsirkin
 m...@redhat.com, Rusty Russell ru...@rustcorp.com.au, 
 kvm@vger.kernel.org,
 virtualizat...@lists.linux-foundation.org, target-de...@vger.kernel.org
 Inviato: Mercoledì, 6 febbraio 2013 10:51:34
 Oggetto: Re: [PATCH] tcm_vhost: Multi-queue support

 On 02/06/2013 04:39 PM, Nicholas A. Bellinger wrote:
 On Wed, 2013-02-06 at 15:09 +0800, Asias He wrote:
 On 02/06/2013 02:45 PM, Nicholas A. Bellinger wrote:
 On Wed, 2013-02-06 at 13:20 +0800, Asias He wrote:
 This adds virtio-scsi multi-queue support to tcm_vhost.

 Guest side virtio-scsi multi-queue support can be found here:

https://lkml.org/lkml/2012/12/18/166

 Some initial perf numbers:
 1 queue,  4 targets, 1 lun per target
 4K request size, 50% randread + 50% randwrite: 127K/127k IOPS

 4 queues, 4 targets, 1 lun per target
 4K request size, 50% randread + 50% randwrite: 181K/181k IOPS
 
 4 VCPUs I suppose?

Yes.

-- 
Asias
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: MMU: lazily drop large spte

2013-02-06 Thread Marcelo Tosatti
On Tue, Feb 05, 2013 at 03:11:09PM +0800, Xiao Guangrong wrote:
 Currently, kvm zaps the large spte if write-protected is needed, the later
 read can fault on that spte. Actually, we can make the large spte readonly
 instead of making them un-present, the page fault caused by read access can
 be avoid
 
 The idea is from Avi:
 | As I mentioned before, write-protecting a large spte is a good idea,
 | since it moves some work from protect-time to fault-time, so it reduces
 | jitter.  This removes the need for the return value.
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting

2013-02-06 Thread Marcelo Tosatti
  According the SDM, software should not touch the IRR when target
  vcpu
  is
  running. Instead, use locked way to access PIR. So your solution may
  wrong. Then your apicv patches are broken, because they do exactly
  that.
  Which code is broken?
  
  The one that updates IRR directly on the apic page.
  No, all the updates are ensuring the target vcpu is not running. So
  it's safe to touch IRR.
  
  Not at all. Read the code.
  Sorry. I still cannot figure out which code is wrong. All the places
  call sync_pir_to_irr() are on target vcpu. Can you point out the code?
  Thanks.
  
  I am taking about vapic patches which are already in, not pir patches.
 Yes, but the issue will be fixed with pir patches. With posted interrupt, it 
 will touch PIR instead IRR and access PIR is allowed by HW.
 
 Best regards,
 Yang
 

From http://www.mail-archive.com/kvm@vger.kernel.org/msg82824.html:


 2. Section 29.6 mentions that Use of the posted-interrupt descriptor
 differs from that of other data structures that are referenced by
 pointers in a VMCS. There is a general requirement that software
 ensure
 that each such data structure is modified only when no logical
 processor
 with a current VMCS that references it is in VMX non-root operation.
 That requirement does not apply to the posted-interrupt descriptor.
 There is a requirement, however, that such modifications be done using
 locked read-modify-write instructions.

 The APIC virtual page is being modified by a CPU while a logical
 processor with current VMCS that references it is in VMX non-root
 operation, in fact even modifying the APIC virtual page with EOI
 virtualizaton, virtual interrupt delivery, etc. What are the
 requirements in this case?
It should be same with posted interrupt. Software must ensure to use
atomic access to virtual apic page.


Can this point be clarified? Software can or cannot access virtual APIC
page while VMCS that references it is in VMX non-root operation?

Because if it cannot, then it means the current code is broken and
VID usage without PIR should not be allowed.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html