[PATCH 3/6] kvm tools: Rework stdio/stdout handling to support redirection
Currently if you redirect the output from lkvm run to a file then term_init() will fail, because it can't call the terminal ioctls. So check if stdin and stdout are ttys, if either is not then skip the rest of the terminal setup. Redirecting one but not the other is a little odd, but does work. Note that we skip registering the cleanup routines, so we don't need to modify them. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- tools/kvm/term.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/tools/kvm/term.c b/tools/kvm/term.c index 4413450..fa85e4a 100644 --- a/tools/kvm/term.c +++ b/tools/kvm/term.c @@ -140,6 +140,15 @@ int term_init(struct kvm *kvm) struct termios term; int i, r; + for (i = 0; i 4; i++) + if (term_fds[i][TERM_FD_IN] == 0) { + term_fds[i][TERM_FD_IN] = STDIN_FILENO; + term_fds[i][TERM_FD_OUT] = STDOUT_FILENO; + } + + if (!isatty(STDIN_FILENO) || !isatty(STDOUT_FILENO)) + return 0; + r = tcgetattr(STDIN_FILENO, orig_term); if (r 0) { pr_warning(unable to save initial standard input settings); @@ -151,12 +160,6 @@ int term_init(struct kvm *kvm) term.c_lflag = ~(ICANON | ECHO | ISIG); tcsetattr(STDIN_FILENO, TCSANOW, term); - for (i = 0; i 4; i++) - if (term_fds[i][TERM_FD_IN] == 0) { - term_fds[i][TERM_FD_IN] = STDIN_FILENO; - term_fds[i][TERM_FD_OUT] = STDOUT_FILENO; - } - signal(SIGTERM, term_sig_cleanup); atexit(term_cleanup); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] kvm tools: More error handling in the ipc code
Add perror() calls to a couple of exit paths, to ease debugging. There are also two places where we print Failed starting IPC thread, but one is really an epoll failure, so make that obvious. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- tools/kvm/kvm-ipc.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/tools/kvm/kvm-ipc.c b/tools/kvm/kvm-ipc.c index bdcc0d1..7897519 100644 --- a/tools/kvm/kvm-ipc.c +++ b/tools/kvm/kvm-ipc.c @@ -49,18 +49,25 @@ static int kvm__create_socket(struct kvm *kvm) } s = socket(AF_UNIX, SOCK_STREAM, 0); - if (s 0) + if (s 0) { + perror(socket); return s; + } + local.sun_family = AF_UNIX; strlcpy(local.sun_path, full_name, sizeof(local.sun_path)); len = strlen(local.sun_path) + sizeof(local.sun_family); r = bind(s, (struct sockaddr *)local, len); - if (r 0) + if (r 0) { + perror(bind); goto fail; + } r = listen(s, 5); - if (r 0) + if (r 0) { + perror(listen); goto fail; + } return s; @@ -430,6 +437,7 @@ int kvm_ipc__init(struct kvm *kvm) epoll_fd = epoll_create(KVM_IPC_MAX_MSGS); if (epoll_fd 0) { + perror(epoll_create); ret = epoll_fd; goto err; } @@ -437,13 +445,14 @@ int kvm_ipc__init(struct kvm *kvm) ev.events = EPOLLIN | EPOLLET; ev.data.fd = sock; if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, sock, ev) 0) { - pr_err(Failed starting IPC thread); + pr_err(Failed adding socket to epoll); ret = -EFAULT; goto err_epoll; } stop_fd = eventfd(0, 0); if (stop_fd 0) { + perror(eventfd); ret = stop_fd; goto err_epoll; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] kvm tools: powerpc: Fix buglet in xics_init() handling of nrcpus
In xics_init() we set the maximum server to kvm-nrcpus, and then set the nr_servers using maximum server + 1. That is off by one, in the harmless direction. Simplify it to just set nr_servers = kvm-nrcpus. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- tools/kvm/powerpc/xics.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tools/kvm/powerpc/xics.c b/tools/kvm/powerpc/xics.c index d4b5caa..cf64a08 100644 --- a/tools/kvm/powerpc/xics.c +++ b/tools/kvm/powerpc/xics.c @@ -445,16 +445,13 @@ static void rtas_int_on(struct kvm_cpu *vcpu, uint32_t token, static int xics_init(struct kvm *kvm) { - int max_server_num; unsigned int i; struct icp_state *icp; struct ics_state *ics; int j; - max_server_num = kvm-nrcpus; - icp = malloc(sizeof(*icp)); - icp-nr_servers = max_server_num + 1; + icp-nr_servers = kvm-nrcpus; icp-ss = malloc(icp-nr_servers * sizeof(struct icp_server_state)); for (i = 0; i icp-nr_servers; i++) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] kvm tools: Return error status in lkvm list
Currently list always returns 0, even if there was an error. Instead have it accumulate any errors and return that. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- tools/kvm/builtin-list.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/tools/kvm/builtin-list.c b/tools/kvm/builtin-list.c index 9299f17..c35be93 100644 --- a/tools/kvm/builtin-list.c +++ b/tools/kvm/builtin-list.c @@ -123,7 +123,7 @@ static void parse_setup_options(int argc, const char **argv) int kvm_cmd_list(int argc, const char **argv, const char *prefix) { - int r; + int status, r; parse_setup_options(argc, argv); @@ -133,17 +133,23 @@ int kvm_cmd_list(int argc, const char **argv, const char *prefix) printf(%6s %-20s %s\n, PID, NAME, STATE); printf(\n); + status = 0; + if (run) { r = kvm_list_running_instances(); if (r 0) perror(Error listing instances); + + status |= r; } if (rootfs) { r = kvm_list_rootfs(); if (r 0) perror(Error listing rootfs); + + status |= r; } - return 0; + return status; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] kvm tools: powerpc: Only emit TB freq if it's non-zero
The kernel can handle a missing timebase-frequency property much better than one that claims zero. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- tools/kvm/powerpc/kvm.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index dc9f89d..b4b9f82 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -389,7 +389,9 @@ static int setup_fdt(struct kvm *kvm) _FDT(fdt_property_cell(fdt, dcache-block-size, cpu_info-d_bsize)); _FDT(fdt_property_cell(fdt, icache-block-size, cpu_info-i_bsize)); - _FDT(fdt_property_cell(fdt, timebase-frequency, cpu_info-tb_freq)); + if (cpu_info-tb_freq) + _FDT(fdt_property_cell(fdt, timebase-frequency, cpu_info-tb_freq)); + /* Lies, but safeish lies! */ _FDT(fdt_property_cell(fdt, clock-frequency, 0xddbab200)); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] kvm tools: powerpc: Add cpu info entry for POWER8
We should hard-code less of this stuff, but for now this works. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- tools/kvm/powerpc/cpu_info.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/tools/kvm/powerpc/cpu_info.c b/tools/kvm/powerpc/cpu_info.c index 11ca14e..a9dfe39 100644 --- a/tools/kvm/powerpc/cpu_info.c +++ b/tools/kvm/powerpc/cpu_info.c @@ -35,6 +35,20 @@ static struct cpu_info cpu_power7_info = { }, }; +/* POWER8 */ + +static struct cpu_info cpu_power8_info = { + .name = POWER8, + .tb_freq = 51200, + .d_bsize = 128, + .i_bsize = 128, + .flags = CPUINFO_FLAG_DFP | CPUINFO_FLAG_VSX | CPUINFO_FLAG_VMX, + .mmu_info = { + .flags = KVM_PPC_PAGE_SIZES_REAL | KVM_PPC_1T_SEGMENTS, + .slb_size = 32, + }, +}; + /* PPC970/G5 */ static struct cpu_info cpu_970_info = { @@ -52,6 +66,7 @@ static struct pvr_info host_pvr_info[] = { { 0x, 0x0f03, cpu_power7_info }, { 0x, 0x003f, cpu_power7_info }, { 0x, 0x004a, cpu_power7_info }, + { 0x, 0x004b, cpu_power8_info }, { 0x, 0x0039, cpu_970_info }, { 0x, 0x003c, cpu_970_info }, { 0x, 0x0044, cpu_970_info }, -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcm_vhost: Multi-queue support
On Wed, 2013-02-06 at 15:09 +0800, Asias He wrote: On 02/06/2013 02:45 PM, Nicholas A. Bellinger wrote: On Wed, 2013-02-06 at 13:20 +0800, Asias He wrote: This adds virtio-scsi multi-queue support to tcm_vhost. Guest side virtio-scsi multi-queue support can be found here: https://lkml.org/lkml/2012/12/18/166 Some initial perf numbers: 1 queue, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 127K/127k IOPS 4 queues, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 181K/181k IOPS Nice single LUN small block random I/O improvement here with 4x vqueues. Curious to see how virtio-scsi small block performance looks with SCSI-core to multi-LUN tcm_vhost endpoints as well.. 8-) Do you mean something like this? 1 queue, 2 targets, 2 lun per target 4 queue, 2 targets, 2 lun per target Btw, this does not apply atop current target-pending.git/for-next with your other pending vhost patch series, and AFAICT this patch is supposed to apply on top of your last PATCH-v3, no..? Ah, this applies on top of mst's 'tcm_vhost: fix pr_err on early kick patch.' plus my last v3 of 'tcm_vhost: Multi-target support'. In that case, applying this patch + PATCH-v3 to auto-next for testing for the moment, and will respin for-next against upstream w/ MST's patch shortly. Also, please include a proper changelog for this second patch. :) Thank you! --nab --nab Signed-off-by: Asias He as...@redhat.com --- drivers/vhost/tcm_vhost.c | 46 +- drivers/vhost/tcm_vhost.h | 2 ++ 2 files changed, 31 insertions(+), 17 deletions(-) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index 81ecda5..9951297 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -48,6 +48,7 @@ #include linux/virtio_net.h /* TODO vhost.h currently depends on this */ #include linux/virtio_scsi.h #include linux/llist.h +#include linux/bitmap.h #include vhost.c #include vhost.h @@ -59,7 +60,8 @@ enum { VHOST_SCSI_VQ_IO = 2, }; -#define VHOST_SCSI_MAX_TARGET 256 +#define VHOST_SCSI_MAX_TARGET 256 +#define VHOST_SCSI_MAX_VQ 128 struct vhost_scsi { /* Protected by vhost_scsi-dev.mutex */ @@ -68,7 +70,7 @@ struct vhost_scsi { bool vs_endpoint; struct vhost_dev dev; - struct vhost_virtqueue vqs[3]; + struct vhost_virtqueue vqs[VHOST_SCSI_MAX_VQ]; struct vhost_work vs_completion_work; /* cmd completion work item */ struct llist_head vs_completion_list; /* cmd completion queue */ @@ -366,12 +368,14 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) { struct vhost_scsi *vs = container_of(work, struct vhost_scsi, vs_completion_work); + DECLARE_BITMAP(signal, VHOST_SCSI_MAX_VQ); struct virtio_scsi_cmd_resp v_rsp; struct tcm_vhost_cmd *tv_cmd; struct llist_node *llnode; struct se_cmd *se_cmd; - int ret; + int ret, vq; + bitmap_zero(signal, VHOST_SCSI_MAX_VQ); llnode = llist_del_all(vs-vs_completion_list); while (llnode) { tv_cmd = llist_entry(llnode, struct tcm_vhost_cmd, @@ -390,15 +394,20 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) memcpy(v_rsp.sense, tv_cmd-tvc_sense_buf, v_rsp.sense_len); ret = copy_to_user(tv_cmd-tvc_resp, v_rsp, sizeof(v_rsp)); - if (likely(ret == 0)) - vhost_add_used(vs-vqs[2], tv_cmd-tvc_vq_desc, 0); - else + if (likely(ret == 0)) { + vhost_add_used(tv_cmd-tvc_vq, tv_cmd-tvc_vq_desc, 0); + vq = tv_cmd-tvc_vq - vs-vqs; + __set_bit(vq, signal); + } else pr_err(Faulted on virtio_scsi_cmd_resp\n); vhost_scsi_free_cmd(tv_cmd); } - vhost_signal(vs-dev, vs-vqs[2]); + vq = -1; + while ((vq = find_next_bit(signal, VHOST_SCSI_MAX_VQ, vq + 1)) + VHOST_SCSI_MAX_VQ) + vhost_signal(vs-dev, vs-vqs[vq]); } static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd( @@ -561,9 +570,9 @@ static void tcm_vhost_submission_work(struct work_struct *work) } } -static void vhost_scsi_handle_vq(struct vhost_scsi *vs) +static void vhost_scsi_handle_vq(struct vhost_scsi *vs, + struct vhost_virtqueue *vq) { - struct vhost_virtqueue *vq = vs-vqs[2]; struct virtio_scsi_cmd_req v_req; struct tcm_vhost_tpg *tv_tpg; struct tcm_vhost_cmd *tv_cmd; @@ -656,7 +665,7 @@ static void vhost_scsi_handle_vq(struct vhost_scsi *vs) ret = __copy_to_user(resp, rsp, sizeof(rsp)); if (!ret) vhost_add_used_and_signal(vs-dev, -
Re: [PATCH] tcm_vhost: Multi-queue support
On 02/06/2013 04:39 PM, Nicholas A. Bellinger wrote: On Wed, 2013-02-06 at 15:09 +0800, Asias He wrote: On 02/06/2013 02:45 PM, Nicholas A. Bellinger wrote: On Wed, 2013-02-06 at 13:20 +0800, Asias He wrote: This adds virtio-scsi multi-queue support to tcm_vhost. Guest side virtio-scsi multi-queue support can be found here: https://lkml.org/lkml/2012/12/18/166 Some initial perf numbers: 1 queue, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 127K/127k IOPS 4 queues, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 181K/181k IOPS Nice single LUN small block random I/O improvement here with 4x vqueues. Curious to see how virtio-scsi small block performance looks with SCSI-core to multi-LUN tcm_vhost endpoints as well.. 8-) Do you mean something like this? 1 queue, 2 targets, 2 lun per target 4 queue, 2 targets, 2 lun per target Btw, this does not apply atop current target-pending.git/for-next with your other pending vhost patch series, and AFAICT this patch is supposed to apply on top of your last PATCH-v3, no..? Ah, this applies on top of mst's 'tcm_vhost: fix pr_err on early kick patch.' plus my last v3 of 'tcm_vhost: Multi-target support'. In that case, applying this patch + PATCH-v3 to auto-next for testing for the moment, and will respin for-next against upstream w/ MST's patch shortly. Okay. Looking forward to more perf numbers. Also, please include a proper changelog for this second patch. :) Sure. tcm_vhost: Multi-queue support This adds virtio-scsi multi-queue support to tcm_vhost. In order to use multi-queue, guest side multi-queue support is need. It can be found here: https://lkml.org/lkml/2012/12/18/166 Currently, only one thread is created by vhost core code for each vhost_scsi instance. Even if there are multi-queues, all the handling of guest kick (vhost_scsi_handle_kick) are processed in one thread. This is not optimal. Luckily, most of the work is offloaded to the tcm_vhost workqueue. Some initial perf numbers: 1 queue, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 127K/127k IOPS 4 queues, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 181K/181k IOPS Signed-off-by: Asias He as...@redhat.com Thank you! --nab --nab Signed-off-by: Asias He as...@redhat.com --- drivers/vhost/tcm_vhost.c | 46 +- drivers/vhost/tcm_vhost.h | 2 ++ 2 files changed, 31 insertions(+), 17 deletions(-) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index 81ecda5..9951297 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -48,6 +48,7 @@ #include linux/virtio_net.h /* TODO vhost.h currently depends on this */ #include linux/virtio_scsi.h #include linux/llist.h +#include linux/bitmap.h #include vhost.c #include vhost.h @@ -59,7 +60,8 @@ enum { VHOST_SCSI_VQ_IO = 2, }; -#define VHOST_SCSI_MAX_TARGET 256 +#define VHOST_SCSI_MAX_TARGET 256 +#define VHOST_SCSI_MAX_VQ 128 struct vhost_scsi { /* Protected by vhost_scsi-dev.mutex */ @@ -68,7 +70,7 @@ struct vhost_scsi { bool vs_endpoint; struct vhost_dev dev; - struct vhost_virtqueue vqs[3]; + struct vhost_virtqueue vqs[VHOST_SCSI_MAX_VQ]; struct vhost_work vs_completion_work; /* cmd completion work item */ struct llist_head vs_completion_list; /* cmd completion queue */ @@ -366,12 +368,14 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) { struct vhost_scsi *vs = container_of(work, struct vhost_scsi, vs_completion_work); + DECLARE_BITMAP(signal, VHOST_SCSI_MAX_VQ); struct virtio_scsi_cmd_resp v_rsp; struct tcm_vhost_cmd *tv_cmd; struct llist_node *llnode; struct se_cmd *se_cmd; - int ret; + int ret, vq; + bitmap_zero(signal, VHOST_SCSI_MAX_VQ); llnode = llist_del_all(vs-vs_completion_list); while (llnode) { tv_cmd = llist_entry(llnode, struct tcm_vhost_cmd, @@ -390,15 +394,20 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) memcpy(v_rsp.sense, tv_cmd-tvc_sense_buf, v_rsp.sense_len); ret = copy_to_user(tv_cmd-tvc_resp, v_rsp, sizeof(v_rsp)); - if (likely(ret == 0)) - vhost_add_used(vs-vqs[2], tv_cmd-tvc_vq_desc, 0); - else + if (likely(ret == 0)) { + vhost_add_used(tv_cmd-tvc_vq, tv_cmd-tvc_vq_desc, 0); + vq = tv_cmd-tvc_vq - vs-vqs; + __set_bit(vq, signal); + } else pr_err(Faulted on virtio_scsi_cmd_resp\n); vhost_scsi_free_cmd(tv_cmd); } - vhost_signal(vs-dev, vs-vqs[2]); + vq = -1; + while ((vq = find_next_bit(signal, VHOST_SCSI_MAX_VQ, vq + 1)) + VHOST_SCSI_MAX_VQ) +
Re: [PATCH 1/6] kvm tools: Return error status in lkvm list
Applied all patches, thanks a lot Michael! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcm_vhost: Multi-queue support
- Messaggio originale - Da: Asias He as...@redhat.com A: Nicholas A. Bellinger n...@linux-iscsi.org Cc: Paolo Bonzini pbonz...@redhat.com, Stefan Hajnoczi stefa...@redhat.com, Michael S. Tsirkin m...@redhat.com, Rusty Russell ru...@rustcorp.com.au, kvm@vger.kernel.org, virtualizat...@lists.linux-foundation.org, target-de...@vger.kernel.org Inviato: Mercoledì, 6 febbraio 2013 10:51:34 Oggetto: Re: [PATCH] tcm_vhost: Multi-queue support On 02/06/2013 04:39 PM, Nicholas A. Bellinger wrote: On Wed, 2013-02-06 at 15:09 +0800, Asias He wrote: On 02/06/2013 02:45 PM, Nicholas A. Bellinger wrote: On Wed, 2013-02-06 at 13:20 +0800, Asias He wrote: This adds virtio-scsi multi-queue support to tcm_vhost. Guest side virtio-scsi multi-queue support can be found here: https://lkml.org/lkml/2012/12/18/166 Some initial perf numbers: 1 queue, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 127K/127k IOPS 4 queues, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 181K/181k IOPS 4 VCPUs I suppose? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/18] KVM/MIPS32: MMU/TLB operations for the Guest.
On Wed, Nov 21, 2012 at 06:34:05PM -0800, Sanjay Lal wrote: - Note that this file is statically linked with the rest of the host kernel (KSEG0). This is because kernel modules are loaded into mapped space on MIPS and we want to make sure that we don't get any host kernel TLB faults while manipulating TLBs. - Virtual Guest TLBs are implemented as 64 entry array regardless of the number of host TLB entries. - Shadow TLBs map Guest virtual addresses to Host physical addresses. - TLB miss handling details: Guest KSEG0 TLBMISS (0x4000 – 0x6000): Transparent to the Guest. Guest KSEG2/3 (0x6000 – 0x8000) Guest UM TLBMISS (0x – 0x4000) Lookup in Guest/Virtual TLB If an entry doesn’t match deliver appropriate TLBMISS LD/ST exception to the guest If entry does exist in the Guest TLB and is NOT Valid Deliver TLB invalid exception to the guest If entry does exist in the Guest TLB and is VALID Inject the TLB entry into the Shadow TLB Signed-off-by: Sanjay Lal sanj...@kymasys.com --- arch/mips/kvm/kvm_tlb.c | 932 1 file changed, 932 insertions(+) create mode 100644 arch/mips/kvm/kvm_tlb.c diff --git a/arch/mips/kvm/kvm_tlb.c b/arch/mips/kvm/kvm_tlb.c new file mode 100644 index 000..2d24333 --- /dev/null +++ b/arch/mips/kvm/kvm_tlb.c @@ -0,0 +1,932 @@ +/* +* This file is subject to the terms and conditions of the GNU General Public +* License. See the file COPYING in the main directory of this archive +* for more details. +* +* KVM/MIPS TLB handling, this file is part of the Linux host kernel so that +* TLB handlers run from KSEG0 +* +* Copyright (C) 2012 MIPS Technologies, Inc. All rights reserved. +* Authors: Sanjay Lal sanj...@kymasys.com +*/ + +#include linux/init.h +#include linux/sched.h +#include linux/smp.h +#include linux/mm.h +#include linux/delay.h +#include linux/module.h +#include linux/kvm_host.h + +#include asm/cpu.h +#include asm/bootinfo.h +#include asm/mmu_context.h +#include asm/pgtable.h +#include asm/cacheflush.h + +#undef CONFIG_MIPS_MT +#include asm/r4kcache.h +#define CONFIG_MIPS_MT + +#define KVM_GUEST_PC_TLB0 +#define KVM_GUEST_SP_TLB1 + +#define PRIx64 llx + +/* Use VZ EntryHi.EHINV to invalidate TLB entries */ +#define UNIQUE_ENTRYHI(idx) (CKSEG0 + ((idx) (PAGE_SHIFT + 1))) + +atomic_t kvm_mips_instance; +EXPORT_SYMBOL(kvm_mips_instance); + +/* These function pointers are initialized once the KVM module is loaded */ +pfn_t(*kvm_mips_gfn_to_pfn) (struct kvm *kvm, gfn_t gfn); +EXPORT_SYMBOL(kvm_mips_gfn_to_pfn); + +void (*kvm_mips_release_pfn_clean) (pfn_t pfn); +EXPORT_SYMBOL(kvm_mips_release_pfn_clean); + +bool(*kvm_mips_is_error_pfn) (pfn_t pfn); +EXPORT_SYMBOL(kvm_mips_is_error_pfn); + +uint32_t kvm_mips_get_kernel_asid(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.guest_kernel_asid[smp_processor_id()] ASID_MASK; +} + + +uint32_t kvm_mips_get_user_asid(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.guest_user_asid[smp_processor_id()] ASID_MASK; +} + +inline uint32_t kvm_mips_get_commpage_asid (struct kvm_vcpu *vcpu) +{ + return vcpu-kvm-arch.commpage_tlb; +} + + +/* + * Structure defining an tlb entry data set. + */ + +void kvm_mips_dump_host_tlbs(void) +{ + struct kvm_mips_tlb tlb; + int i; + ulong flags; + unsigned long old_entryhi; + unsigned long old_pagemask; + + local_irq_save(flags); + + old_entryhi = read_c0_entryhi(); + old_pagemask = read_c0_pagemask(); + + printk(HOST TLBs:\n); + printk(ASID: %#lx\n, read_c0_entryhi() ASID_MASK); + + for (i = 0; i current_cpu_data.tlbsize; i++) { + write_c0_index(i); + mtc0_tlbw_hazard(); + + tlb_read(); + tlbw_use_hazard(); + + tlb.tlb_hi = read_c0_entryhi(); + tlb.tlb_lo0 = read_c0_entrylo0(); + tlb.tlb_lo1 = read_c0_entrylo1(); + tlb.tlb_mask = read_c0_pagemask(); + + printk(TLB%c%3d Hi 0x%08lx , +(tlb.tlb_lo0 | tlb.tlb_lo1) MIPS3_PG_V ? ' ' : '*', +i, tlb.tlb_hi); + printk(Lo0=0x%09 PRIx64 %c%c attr %lx , +(uint64_t) mips3_tlbpfn_to_paddr(tlb.tlb_lo0), +(tlb.tlb_lo0 MIPS3_PG_D) ? 'D' : ' ', +(tlb.tlb_lo0 MIPS3_PG_G) ? 'G' : ' ', +(tlb.tlb_lo0 3) 7); + printk(Lo1=0x%09 PRIx64 %c%c attr %lx sz=%lx\n, +(uint64_t) mips3_tlbpfn_to_paddr(tlb.tlb_lo1), +(tlb.tlb_lo1 MIPS3_PG_D) ? 'D' : ' ', +(tlb.tlb_lo1 MIPS3_PG_G) ? 'G' : ' ', +(tlb.tlb_lo1 3) 7, tlb.tlb_mask); + } +
[PATCH] kvm tools: arm: fix GIC #defines to match latest kvm code
During the review process for the KVM ARM patches, the GIC device registration was subjected to some minor renaming, so update kvm tool appropriately. Signed-off-by: Will Deacon will.dea...@arm.com --- tools/kvm/arm/gic.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/tools/kvm/arm/gic.c b/tools/kvm/arm/gic.c index 3f42c3a..8d2ff87 100644 --- a/tools/kvm/arm/gic.c +++ b/tools/kvm/arm/gic.c @@ -22,15 +22,15 @@ int gic__alloc_irqnum(void) int gic__init_irqchip(struct kvm *kvm) { int err; - struct kvm_device_address gic_addr[] = { + struct kvm_arm_device_addr gic_addr[] = { [0] = { - .id = (KVM_ARM_DEVICE_VGIC_V2 KVM_DEVICE_ID_SHIFT) |\ - KVM_VGIC_V2_ADDR_TYPE_DIST, + .id = KVM_VGIC_V2_ADDR_TYPE_DIST | + (KVM_ARM_DEVICE_VGIC_V2 KVM_ARM_DEVICE_ID_SHIFT), .addr = ARM_GIC_DIST_BASE, }, [1] = { - .id = (KVM_ARM_DEVICE_VGIC_V2 KVM_DEVICE_ID_SHIFT) |\ - KVM_VGIC_V2_ADDR_TYPE_CPU, + .id = KVM_VGIC_V2_ADDR_TYPE_CPU | + (KVM_ARM_DEVICE_VGIC_V2 KVM_ARM_DEVICE_ID_SHIFT), .addr = ARM_GIC_CPUI_BASE, } }; @@ -45,11 +45,11 @@ int gic__init_irqchip(struct kvm *kvm) if (err) return err; - err = ioctl(kvm-vm_fd, KVM_SET_DEVICE_ADDRESS, gic_addr[0]); + err = ioctl(kvm-vm_fd, KVM_ARM_SET_DEVICE_ADDR, gic_addr[0]); if (err) return err; - err = ioctl(kvm-vm_fd, KVM_SET_DEVICE_ADDRESS, gic_addr[1]); + err = ioctl(kvm-vm_fd, KVM_ARM_SET_DEVICE_ADDR, gic_addr[1]); return err; } -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: arm: fix GIC #defines to match latest kvm code
On Wed, Feb 6, 2013 at 2:12 PM, Will Deacon will.dea...@arm.com wrote: During the review process for the KVM ARM patches, the GIC device registration was subjected to some minor renaming, so update kvm tool appropriately. Signed-off-by: Will Deacon will.dea...@arm.com Applied, thanks Will! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 02/18] KVM/MIPS32: Arch specific KVM data structures.
On Wed, Nov 21, 2012 at 06:34:00PM -0800, Sanjay Lal wrote: +struct kvm_mips_callbacks { + int (*handle_cop_unusable) (struct kvm_vcpu *vcpu); + int (*handle_tlb_mod) (struct kvm_vcpu *vcpu); + int (*handle_tlb_ld_miss) (struct kvm_vcpu *vcpu); + int (*handle_tlb_st_miss) (struct kvm_vcpu *vcpu); + int (*handle_addr_err_st) (struct kvm_vcpu *vcpu); + int (*handle_addr_err_ld) (struct kvm_vcpu *vcpu); + int (*handle_syscall) (struct kvm_vcpu *vcpu); + int (*handle_res_inst) (struct kvm_vcpu *vcpu); + int (*handle_break) (struct kvm_vcpu *vcpu); + int (*vm_init) (struct kvm *kvm); + int (*vcpu_init) (struct kvm_vcpu *vcpu); + int (*vcpu_setup) (struct kvm_vcpu *vcpu); + gpa_t(*gva_to_gpa) (gva_t gva); + void (*queue_timer_int) (struct kvm_vcpu *vcpu); + void (*dequeue_timer_int) (struct kvm_vcpu *vcpu); + void (*queue_io_int) (struct kvm_vcpu *vcpu, + struct kvm_mips_interrupt *irq); + void (*dequeue_io_int) (struct kvm_vcpu *vcpu, + struct kvm_mips_interrupt *irq); + int (*irq_deliver) (struct kvm_vcpu *vcpu, unsigned int priority, + uint32_t cause); + int (*irq_clear) (struct kvm_vcpu *vcpu, unsigned int priority, + uint32_t cause); + int (*vcpu_ioctl_get_regs) (struct kvm_vcpu *vcpu, + struct kvm_regs *regs); + int (*vcpu_ioctl_set_regs) (struct kvm_vcpu *vcpu, + struct kvm_regs *regs); +}; You haven't addressed Avi's comment about dropping the interaction and adding it later, when other HW is supported and the best way to do the split is known. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 09/18] KVM/MIPS32: COP0 accesses profiling.
On Wed, Nov 21, 2012 at 06:34:07PM -0800, Sanjay Lal wrote: Signed-off-by: Sanjay Lal sanj...@kymasys.com --- arch/mips/kvm/kvm_mips_stats.c | 81 ++ 1 file changed, 81 insertions(+) create mode 100644 arch/mips/kvm/kvm_mips_stats.c diff --git a/arch/mips/kvm/kvm_mips_stats.c b/arch/mips/kvm/kvm_mips_stats.c new file mode 100644 index 000..e442a26 --- /dev/null +++ b/arch/mips/kvm/kvm_mips_stats.c @@ -0,0 +1,81 @@ +/* +* This file is subject to the terms and conditions of the GNU General Public +* License. See the file COPYING in the main directory of this archive +* for more details. +* +* KVM/MIPS: COP0 access histogram +* +* Copyright (C) 2012 MIPS Technologies, Inc. All rights reserved. +* Authors: Sanjay Lal sanj...@kymasys.com +*/ + +#include linux/kvm_host.h + +char *kvm_mips_exit_types_str[MAX_KVM_MIPS_EXIT_TYPES] = { + WAIT, + CACHE, + Signal, + Interrupt, + COP0/1 Unusable, + TLB Mod, + TLB Miss (LD), + TLB Miss (ST), + Address Err (ST), + Address Error (LD), + System Call, + Reserved Inst, + Break Inst, + D-Cache Flushes, +}; + +char *kvm_cop0_str[N_MIPS_COPROC_REGS] = { + Index, + Random, + EntryLo0, + EntryLo1, + Context, + PG Mask, + Wired, + HWREna, + BadVAddr, + Count, + EntryHI, + Compare, + Status, + Cause, + EXC PC, + PRID, + Config, + LLAddr, + Watch Lo, + Watch Hi, + X Context, + Reserved, + Impl Dep, + Debug, + DEPC, + PerfCnt, + ErrCtl, + CacheErr, + TagLo, + TagHi, + ErrorEPC, + DESAVE +}; + +int kvm_mips_dump_stats(struct kvm_vcpu *vcpu) +{ + int i, j __unused; +#ifdef CONFIG_KVM_MIPS_DEBUG_COP0_COUNTERS + printk(\nKVM VCPU[%d] COP0 Access Profile:\n, vcpu-vcpu_id); + for (i = 0; i N_MIPS_COPROC_REGS; i++) { + for (j = 0; j N_MIPS_COPROC_SEL; j++) { + if (vcpu-arch.cop0-stat[i][j]) + printk(%s[%d]: %lu\n, kvm_cop0_str[i], j, +vcpu-arch.cop0-stat[i][j]); + } + } +#endif + + return 0; +} You need to use ftrace event for that. Much more flexible with perf integration and no need to recompile to enabled/disable. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 11/18] KVM/MIPS32: Routines to handle specific traps/exceptions while executing the guest.
On Wed, Nov 21, 2012 at 06:34:09PM -0800, Sanjay Lal wrote: +static gpa_t kvm_trap_emul_gva_to_gpa_cb(gva_t gva) +{ + gpa_t gpa; + uint32_t kseg = KSEGX(gva); + + if ((kseg == CKSEG0) || (kseg == CKSEG1)) You seems to be using KVM_GUEST_KSEGX variants on gva in all other places. Why not here? + gpa = CPHYSADDR(gva); + else { + printk(%s: cannot find GPA for GVA: %#lx\n, __func__, gva); + kvm_mips_dump_host_tlbs(); + gpa = KVM_INVALID_ADDR; + } + +#ifdef DEBUG + kvm_debug(%s: gva %#lx, gpa: %#llx\n, __func__, gva, gpa); +#endif + + return gpa; +} + -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Add a timer to allow the separation of consigned from steal time.
On 02/06/2013 01:49 AM, Michael Wolf wrote: Add a helper routine to scheduler/core.c to allow the kvm module to retrieve the cpu hardlimit settings. The values will be used to set up a timer that is used to separate the consigned from the steal time. Sorry: What is the business of a timer in here? Whenever we read steal time, we know how much time has passed and with that information we can know the entitlement for the period. This breaks if we suspend, but we know that we suspended, so this is not a problem. Everything bigger the entitlement is steal time. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] virtio-scsi: reset virtqueue affinity when doing cpu hotplug
Il 16/01/2013 04:55, Wanlong Gao ha scritto: Add hot cpu notifier to reset the request virtqueue affinity when doing cpu hotplug. You need to be careful to get_online_cpus() and put_online_cpus() here, so CPUs can't go up and down in the middle of operations. In particular, get_online_cpus()/put_online_cpus() around calls to virtscsi_set_affinity() (except within notifiers). Yes, I'll take care of this, thank you. I squashed patch 1 (plus changes to get/put_online_cpus) in my multiqueue series, and applied this one as a separate patch. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Add a timer to allow the separation of consigned from steal time.
On 02/06/2013 08:36 AM, Glauber Costa wrote: On 02/06/2013 01:49 AM, Michael Wolf wrote: Add a helper routine to scheduler/core.c to allow the kvm module to retrieve the cpu hardlimit settings. The values will be used to set up a timer that is used to separate the consigned from the steal time. Sorry: What is the business of a timer in here? Whenever we read steal time, we know how much time has passed and with that information we can know the entitlement for the period. This breaks if we suspend, but we know that we suspended, so this is not a problem. I may be missing something, but how do we know how much time has passed? That is why I had the timer in there. I will go look again at the code but I thought the data was collected as ticks and passed at random times. The ticks are also accumulating so we are looking at the difference in the count between reads. Everything bigger the entitlement is steal time. I agree provided I know the amount of total time that the steal time was accumulated. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DMAR faults from unrelated device when vfio is used
Hi, Am Tue, 05 Feb 2013 13:36:53 -0700 schrieb Alex Williamson alex.william...@redhat.com: Ugh, the infamous and useless error 10. It could be anything. I've got a system with onboard usb3, let me see what windows does with it here first. Thanks, Well, I've got an Etron USB3 HBA and (un)fortunately it works just fine with a Win7 guest. There's really nothing special about USB controllers from a PCI device assignment perspective. Have you tried the latest upstream qemu bits? Thanks, USB3 does also not work within a Linux guest. xhci in debug mode gives a bit more infos. [1.157888] xhci_hcd :00:07.0: xHCI Host Controller [1.157899] xhci_hcd :00:07.0: new USB bus registered, assigned bus number 4 [1.157948] xhci_hcd :00:07.0: // Halt the HC [1.157957] xhci_hcd :00:07.0: Resetting HCD [1.157962] xhci_hcd :00:07.0: // Reset the HC [1.158111] usb 3-1: new full-speed USB device number 2 using uhci_hcd [1.158125] xhci_hcd :00:07.0: Wait for controller to be ready for doorbell rings [1.158130] xhci_hcd :00:07.0: Reset complete [1.158133] xhci_hcd :00:07.0: Enabling 64-bit DMA addresses. [1.158135] xhci_hcd :00:07.0: Calling HCD init [1.158136] xhci_hcd :00:07.0: xhci_init [1.158137] xhci_hcd :00:07.0: xHCI doesn't need link TRB QUIRK [1.158640] xhci_hcd :00:07.0: Finished xhci_init [1.158642] xhci_hcd :00:07.0: Called HCD init [1.158698] xhci_hcd :00:07.0: irq 11, io mem 0xfebf4000 [1.158699] xhci_hcd :00:07.0: xhci_run [1.159578] xhci_hcd :00:07.0: irq 40 for MSI/MSI-X [1.159697] xhci_hcd :00:07.0: irq 41 for MSI/MSI-X [1.159720] xhci_hcd :00:07.0: irq 42 for MSI/MSI-X [1.159736] xhci_hcd :00:07.0: irq 43 for MSI/MSI-X [1.159752] xhci_hcd :00:07.0: irq 44 for MSI/MSI-X [1.179682] xhci_hcd :00:07.0: Setting event ring polling timer [1.179686] xhci_hcd :00:07.0: Command ring memory map follows: [1.179693] xhci_hcd :00:07.0: ERST memory map follows: [1.179695] xhci_hcd :00:07.0: Event ring: [1.179702] xhci_hcd :00:07.0: ERST deq = 64'h36820400 [1.179703] xhci_hcd :00:07.0: // Set the interrupt modulation register [1.179710] xhci_hcd :00:07.0: // Enable interrupts, cmd = 0x4. [1.179715] xhci_hcd :00:07.0: // Enabling event ring interrupter c9e68620 by writing 0x2 to irq_pending [1.179737] xhci_hcd :00:07.0: Finished xhci_run for USB2 roothub [1.179752] usb usb4: New USB device found, idVendor=1d6b, idProduct=0002 [1.179753] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [1.179755] usb usb4: Product: xHCI Host Controller [1.179756] usb usb4: Manufacturer: Linux 3.8.0-rc6-2.10-desktop xhci_hcd [1.179757] usb usb4: SerialNumber: :00:07.0 [1.179967] xHCI xhci_add_endpoint called for root hub [1.179971] xHCI xhci_check_bandwidth called for root hub [1.180081] hub 4-0:1.0: USB hub found [1.180094] hub 4-0:1.0: 2 ports detected [1.180200] xhci_hcd :00:07.0: xHCI Host Controller [1.180206] xhci_hcd :00:07.0: new USB bus registered, assigned bus number 5 [1.180214] xhci_hcd :00:07.0: Enabling 64-bit DMA addresses. [1.180219] xhci_hcd :00:07.0: // Turn on HC, cmd = 0x5. [1.245201] xhci_hcd :00:07.0: Host took too long to start, waited 16000 microseconds. This one looks interesting. [1.245414] xhci_hcd :00:07.0: // Halt the HC [1.245424] xhci_hcd :00:07.0: startup error -19 [1.245551] xhci_hcd :00:07.0: USB bus 5 deregistered [1.245556] xhci_hcd :00:07.0: remove, state 1 [1.245560] usb usb4: USB disconnect, device number 1 [1.245608] xHCI xhci_drop_endpoint called for root hub [1.245609] xHCI xhci_check_bandwidth called for root hub [1.245684] xhci_hcd :00:07.0: // Halt the HC [1.245695] xhci_hcd :00:07.0: // Reset the HC [1.245741] xhci_hcd :00:07.0: Wait for controller to be ready for doorbell rings [1.256413] xhci_hcd :00:07.0: // Disabling event ring interrupts [1.256427] xhci_hcd :00:07.0: cleaning up memory [1.256440] xhci_hcd :00:07.0: xhci_stop completed - status = 1 [1.256446] xhci_hcd :00:07.0: USB bus 4 deregistered [1.258194] ata_piix :00:01.1: version 2.13 Within the guest lscpi -vv gives: 00:07.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04) (prog-if 30 [XHCI]) Subsystem: Intel Corporation Device 2008 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort+ SERR- PERR- INTx- Interrupt: pin A routed to IRQ 11 Region 0: Memory at febf4000 (64-bit, non-prefetchable) [size=8K] Capabilities: [50] Power Management version 3 Flags:
Re: DMAR faults from unrelated device when vfio is used
On Wed, 2013-02-06 at 19:09 +0100, Richard Weinberger wrote: Hi, Am Tue, 05 Feb 2013 13:36:53 -0700 schrieb Alex Williamson alex.william...@redhat.com: Ugh, the infamous and useless error 10. It could be anything. I've got a system with onboard usb3, let me see what windows does with it here first. Thanks, Well, I've got an Etron USB3 HBA and (un)fortunately it works just fine with a Win7 guest. There's really nothing special about USB controllers from a PCI device assignment perspective. Have you tried the latest upstream qemu bits? Thanks, USB3 does also not work within a Linux guest. xhci in debug mode gives a bit more infos. Does the card work with pci-assign or are both broken? [1.157888] xhci_hcd :00:07.0: xHCI Host Controller [1.157899] xhci_hcd :00:07.0: new USB bus registered, assigned bus number 4 [1.157948] xhci_hcd :00:07.0: // Halt the HC [1.157957] xhci_hcd :00:07.0: Resetting HCD [1.157962] xhci_hcd :00:07.0: // Reset the HC [1.158111] usb 3-1: new full-speed USB device number 2 using uhci_hcd [1.158125] xhci_hcd :00:07.0: Wait for controller to be ready for doorbell rings [1.158130] xhci_hcd :00:07.0: Reset complete [1.158133] xhci_hcd :00:07.0: Enabling 64-bit DMA addresses. [1.158135] xhci_hcd :00:07.0: Calling HCD init [1.158136] xhci_hcd :00:07.0: xhci_init [1.158137] xhci_hcd :00:07.0: xHCI doesn't need link TRB QUIRK [1.158640] xhci_hcd :00:07.0: Finished xhci_init [1.158642] xhci_hcd :00:07.0: Called HCD init [1.158698] xhci_hcd :00:07.0: irq 11, io mem 0xfebf4000 [1.158699] xhci_hcd :00:07.0: xhci_run [1.159578] xhci_hcd :00:07.0: irq 40 for MSI/MSI-X [1.159697] xhci_hcd :00:07.0: irq 41 for MSI/MSI-X [1.159720] xhci_hcd :00:07.0: irq 42 for MSI/MSI-X [1.159736] xhci_hcd :00:07.0: irq 43 for MSI/MSI-X [1.159752] xhci_hcd :00:07.0: irq 44 for MSI/MSI-X [1.179682] xhci_hcd :00:07.0: Setting event ring polling timer [1.179686] xhci_hcd :00:07.0: Command ring memory map follows: [1.179693] xhci_hcd :00:07.0: ERST memory map follows: [1.179695] xhci_hcd :00:07.0: Event ring: [1.179702] xhci_hcd :00:07.0: ERST deq = 64'h36820400 [1.179703] xhci_hcd :00:07.0: // Set the interrupt modulation register [1.179710] xhci_hcd :00:07.0: // Enable interrupts, cmd = 0x4. [1.179715] xhci_hcd :00:07.0: // Enabling event ring interrupter c9e68620 by writing 0x2 to irq_pending [1.179737] xhci_hcd :00:07.0: Finished xhci_run for USB2 roothub [1.179752] usb usb4: New USB device found, idVendor=1d6b, idProduct=0002 [1.179753] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [1.179755] usb usb4: Product: xHCI Host Controller [1.179756] usb usb4: Manufacturer: Linux 3.8.0-rc6-2.10-desktop xhci_hcd [1.179757] usb usb4: SerialNumber: :00:07.0 [1.179967] xHCI xhci_add_endpoint called for root hub [1.179971] xHCI xhci_check_bandwidth called for root hub [1.180081] hub 4-0:1.0: USB hub found [1.180094] hub 4-0:1.0: 2 ports detected [1.180200] xhci_hcd :00:07.0: xHCI Host Controller [1.180206] xhci_hcd :00:07.0: new USB bus registered, assigned bus number 5 [1.180214] xhci_hcd :00:07.0: Enabling 64-bit DMA addresses. [1.180219] xhci_hcd :00:07.0: // Turn on HC, cmd = 0x5. [1.245201] xhci_hcd :00:07.0: Host took too long to start, waited 16000 microseconds. This one looks interesting. Yep, the register never got to the state it was looking for. [1.245414] xhci_hcd :00:07.0: // Halt the HC [1.245424] xhci_hcd :00:07.0: startup error -19 [1.245551] xhci_hcd :00:07.0: USB bus 5 deregistered [1.245556] xhci_hcd :00:07.0: remove, state 1 [1.245560] usb usb4: USB disconnect, device number 1 [1.245608] xHCI xhci_drop_endpoint called for root hub [1.245609] xHCI xhci_check_bandwidth called for root hub [1.245684] xhci_hcd :00:07.0: // Halt the HC [1.245695] xhci_hcd :00:07.0: // Reset the HC [1.245741] xhci_hcd :00:07.0: Wait for controller to be ready for doorbell rings [1.256413] xhci_hcd :00:07.0: // Disabling event ring interrupts [1.256427] xhci_hcd :00:07.0: cleaning up memory [1.256440] xhci_hcd :00:07.0: xhci_stop completed - status = 1 [1.256446] xhci_hcd :00:07.0: USB bus 4 deregistered [1.258194] ata_piix :00:01.1: version 2.13 Within the guest lscpi -vv gives: 00:07.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04) (prog-if 30 [XHCI]) Subsystem: Intel Corporation Device 2008 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+
[PATCH 66/77] vfio: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. v2: Restore accidentally dropped index 0 comment as suggested by Alex. Signed-off-by: Tejun Heo t...@kernel.org Acked-by: Alex Williamson alex.william...@redhat.com Cc: kvm@vger.kernel.org --- drivers/vfio/vfio.c | 17 + 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 12c264d..7f61abf 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -139,23 +139,8 @@ EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver); */ static int vfio_alloc_group_minor(struct vfio_group *group) { - int ret, minor; - -again: - if (unlikely(idr_pre_get(vfio.group_idr, GFP_KERNEL) == 0)) - return -ENOMEM; - /* index 0 is used by /dev/vfio/vfio */ - ret = idr_get_new_above(vfio.group_idr, group, 1, minor); - if (ret == -EAGAIN) - goto again; - if (ret || minor MINORMASK) { - if (minor MINORMASK) - idr_remove(vfio.group_idr, minor); - return -ENOSPC; - } - - return minor; + return idr_alloc(vfio.group_idr, group, 1, MINORMASK + 1, GFP_KERNEL); } static void vfio_free_group_minor(int minor) -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DMAR faults from unrelated device when vfio is used
Hi, Am Wed, 06 Feb 2013 11:47:20 -0700 schrieb Alex Williamson alex.william...@redhat.com: Does the card work with pci-assign or are both broken? It works with pci-assign. :-\ Possible there's a bug in how we're managing the vector table and pba here. Can you get to the monitor and run 'into mtree' and provide the results? Thanks, Please see attachment. Thanks, //richard(qemu) info mtree info mtree memory -7ffe (prio 0, RW): system -dfff (prio 0, RW): alias ram-below-4g @pc.ram -dfff 000a-000b (prio 1, RW): alias smram-region @pci 000a-000b 000c-000c3fff (prio 1, R-): alias pam-rom @pc.ram 000c-000c3fff 000c4000-000c7fff (prio 1, R-): alias pam-rom @pc.ram 000c4000-000c7fff 000c8000-000cbfff (prio 1, R-): alias pam-rom @pc.ram 000c8000-000cbfff 000cb000-000cdfff (prio 1000, RW): alias kvmvapic-rom @pc.ram 000cb000-000cdfff 000cc000-000c (prio 1, R-): alias pam-rom @pc.ram 000cc000-000c 000d-000d3fff (prio 1, RW): alias pam-ram @pc.ram 000d-000d3fff 000d4000-000d7fff (prio 1, RW): alias pam-ram @pc.ram 000d4000-000d7fff 000d8000-000dbfff (prio 1, RW): alias pam-ram @pc.ram 000d8000-000dbfff 000dc000-000d (prio 1, RW): alias pam-ram @pc.ram 000dc000-000d 000e-000e3fff (prio 1, RW): alias pam-ram @pc.ram 000e-000e3fff 000e4000-000e7fff (prio 1, RW): alias pam-ram @pc.ram 000e4000-000e7fff 000e8000-000ebfff (prio 1, RW): alias pam-ram @pc.ram 000e8000-000ebfff 000ec000-000e (prio 1, RW): alias pam-ram @pc.ram 000ec000-000e 000f-000f (prio 1, R-): alias pam-rom @pc.ram 000f-000f e000- (prio 0, RW): alias pci-hole @pci e000- fec0-fec00fff (prio 0, RW): kvm-ioapic fed0-fed003ff (prio 0, RW): hpet fee0-feef (prio 0, RW): kvm-apic-msi 0001-00019fff (prio 0, RW): alias ram-above-4g @pc.ram e000-00017fff 0001a000-40019fff (prio 0, RW): alias pci-hole64 @pci 0001a000-40019fff I/O - (prio 0, RW): io 0020-0021 (prio 0, RW): kvm-pic 0040-0043 (prio 0, RW): kvm-pit 0060-0060 (prio 0, RW): i8042-data 0061-0061 (prio 0, RW): elcr 0064-0064 (prio 0, RW): i8042-cmd 0070-0071 (prio 0, RW): rtc 007e-007f (prio 0, RW): kvmvapic 0092-0092 (prio 0, RW): port92 00a0-00a1 (prio 0, RW): kvm-pic 0170-0177 (prio 0, RW): alias ide @ide 0170-0177 01ce-01d0 (prio 0, RW): alias vbe @vbe 01ce-01d0 01f0-01f7 (prio 0, RW): alias ide @ide 01f0-01f7 0376-0376 (prio 0, RW): alias ide @ide 0376-0376 0378-037f (prio 0, RW): alias parallel @parallel 0378-037f 03b4-03b5 (prio 0, RW): alias vga @vga 03b4-03b5 03ba-03ba (prio 0, RW): alias vga @vga 03ba-03ba 03c0-03cf (prio 0, RW): alias vga @vga 03c0-03cf 03d4-03d5 (prio 0, RW): alias vga @vga 03d4-03d5 03da-03da (prio 0, RW): alias vga @vga 03da-03da 03f1-03f5 (prio 0, RW): alias fdc @fdc 03f1-03f5 03f6-03f6 (prio 0, RW): alias ide @ide 03f6-03f6 03f7-03f7 (prio 0, RW): alias fdc @fdc 03f7-03f7 03f8-03ff (prio 0, RW): serial 04d0-04d0 (prio 0, RW): kvm-elcr 04d1-04d1 (prio 0, RW): kvm-elcr 0510-0511 (prio 0, RW): fwcfg 0cf8-0cfb (prio 0, RW): pci-conf-idx 0cfc-0cff (prio 0, RW): pci-conf-data 5658-5658 (prio 0,
Re: [PATCH 2/4] Expand the steal time msr to also contain the consigned time.
On 02/05/2013 04:49 PM, Michael Wolf wrote: Expand the steal time msr to also contain the consigned time. Signed-off-by: Michael Wolf m...@linux.vnet.ibm.com --- arch/x86/include/asm/paravirt.h |4 ++-- arch/x86/include/asm/paravirt_types.h |2 +- arch/x86/kernel/kvm.c |7 ++- kernel/sched/core.c | 10 +- kernel/sched/cputime.c|2 +- 5 files changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 5edd174..9b753ea 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -196,9 +196,9 @@ struct static_key; extern struct static_key paravirt_steal_enabled; extern struct static_key paravirt_steal_rq_enabled; -static inline u64 paravirt_steal_clock(int cpu) +static inline void paravirt_steal_clock(int cpu, u64 *steal) { - return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu); + PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal); } This may be a stupid question, but what happens if a KVM guest with this change, runs on a kernel that still has the old steal time interface? What happens if the host has the new steal time interface, but the guest uses the old interface? Will both cases continue to work as expected with your patch series? If so, could you document (in the source code) why things continue to work? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Add the code to send the consigned time from the host to the guest
On 02/05/2013 04:49 PM, Michael Wolf wrote: Change the paravirt calls that retrieve the steal-time information from the host. Add to it getting the consigned value as well as the steal time. Signed-off-by: Michael Wolf m...@linux.vnet.ibm.com diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 06fdbd9..55d617f 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -42,9 +42,10 @@ struct kvm_steal_time { __u64 steal; + __u64 consigned; __u32 version; __u32 flags; - __u32 pad[12]; + __u32 pad[10]; }; The function kvm_register_steal_time passes the address of such a structure to the host kernel, which then does something with it. Could running a guest with the above patch, on top of a host with the old code, result in the values for version and flags being written into consigned? Could that result in confusing the guest kernel to no end, and generally breaking things? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V5] target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs
Marcelo, Hi, I have been watching for this patch in the upstream but have not seen it yet. What version of QEMU should it be in? Thanks, Will -Original Message- From: Marcelo Tosatti [mailto:mtosa...@redhat.com] Sent: Friday, November 30, 2012 12:40 PM To: Auld, Will Cc: qemu-devel; Gleb; Andreas Farber; kvm@vger.kernel.org; Dugger, Donald D; Liu, Jinsong; Zhang, Xiantao; a...@redhat.com Subject: Re: [PATCH V5] target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs On Mon, Nov 26, 2012 at 09:32:18PM -0800, Will Auld wrote: CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported Basic design is to emulate the MSR by allowing reads and writes to the hypervisor vcpu specific locations to store the value of the emulated MSRs. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. As this is a new MSR that the guest may access and modify its value needs to be migrated along with the other MRSs. The changes here are specifically for recognizing when IA32_TSC_ADJUST is enabled in CPUID and code added for migrating its value. Signed-off-by: Will Auld will.a...@intel.com --- Andreas, Thanks, that helped. I used Stefan's auto-run method this time. Will target-i386/cpu.h | 2 ++ target-i386/kvm.c | 14 ++ target-i386/machine.c | 21 + 3 files changed, 37 insertions(+) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DMAR faults from unrelated device when vfio is used
On Wed, 2013-02-06 at 21:25 +0100, Richard Weinberger wrote: Hi, Am Wed, 06 Feb 2013 11:47:20 -0700 schrieb Alex Williamson alex.william...@redhat.com: Does the card work with pci-assign or are both broken? It works with pci-assign. :-\ When you tested this, did you detach the group from vfio or use it as is? In your previous message I see this: 03:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host Controller [1033:0194] (rev ff) /sys/kernel/iommu_groups/7/devices: total 0 lrwxrwxrwx 1 root root 0 Feb 4 10:29 :00:1c.0 - ../../../../devices/pci:00/:00:1c.0 lrwxrwxrwx 1 root root 0 Feb 4 10:29 :00:1c.6 - ../../../../devices/pci:00/:00:1c.6 lrwxrwxrwx 1 root root 0 Feb 4 10:29 :03:00.0 - ../../../../devices/pci:00/:00:1c.6/:03:00.0 This seemed like a good card to have in my test cache, so I went and got one and it works fine for me... but I've been playing with pcieport because I don't think we're handling them correctly in vfio. Can you provide lspci -vvv -s 1c.6 while the guest is running? I'm going to bet that Control: I/O+ Mem+ BusMaster+ is not set, which it would have been if pci-assign was tested without the group bound to vfio. I think the solution is going to be something around white-listing pcieport, which you can easily test with a kernel patch like this: diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 12c264d..48a97fb 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -442,7 +442,7 @@ static struct vfio_device *vfio_group_get_device(struct vfio * a device. It's not always practical to leave a device within a group * driverless as it could get re-bound to something unsafe. */ -static const char * const vfio_driver_whitelist[] = { pci-stub }; +static const char * const vfio_driver_whitelist[] = { pci-stub, pcieport }; static bool vfio_whitelisted_driver(struct device_driver *drv) { Then you won't need to bind 1c.0 or 1c.6 to vfio-pci and hopefully things will work. The other problem you might hit is that the pciehp service driver may also be bound to these slots and somehow deletes the pci device and re-adds it when a device reset happens. This causes all sorts of badness. The solution here is to unbind the child device from pciehp, ie: echo :00:1c.0:pcie04 | sudo \ tee /sys/bus/pci_express/drivers/pciehp/unbind echo :00:1c.6:pcie04 | sudo \ tee /sys/bus/pci_express/drivers/pciehp/unbind Hopefully combined that will make things work, please let me know. Another option is to move the device to a slot where it isn't grouped with the root port above it, assuming it's a plugin card. Also if we could determine that these root ports support PCI ACS but just don't report it, we could change the grouping and avoid root ports grouped with devices. I'm still trying to formulate how to fix this long term, whether we should whitelist pcieport and require userspace to do this kind of set (need a hotplug stub driver?) or if vfio-pci needs to gain some basic pcieport functionality that can enable the device and bind service drivers we want (aer) and avoid ones we don't (pciehp). Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting
On Tue, Feb 05, 2013 at 09:32:50AM +0200, Gleb Natapov wrote: On Mon, Feb 04, 2013 at 06:47:30PM -0200, Marcelo Tosatti wrote: On Mon, Feb 04, 2013 at 05:59:52PM -0200, Marcelo Tosatti wrote: On Mon, Feb 04, 2013 at 07:13:01PM +0200, Gleb Natapov wrote: On Mon, Feb 04, 2013 at 12:43:45PM -0200, Marcelo Tosatti wrote: Any example how software relies on such two-interrupts-queued-in-IRR/ISR behaviour? Don't know about guests, but KVM relies on it to detect interrupt coalescing. So if interrupt is set in IRR but not in PIR interrupt will not be reported as coalesced, but it will be coalesced during PIR-IRR merge. Yes, so: 1. IRR=1, ISR=0, PIR=0. Event: set_irq, coalesced=no. 2. IRR=0, ISR=1, PIR=0. Event: IRR-ISR transfer. 3. vcpu outside of guest mode. 4. IRR=1, ISR=1, PIR=0. Event: set_irq, coalesced=no. 5. vcpu enters guest mode. 6. IRR=1, ISR=1, PIR=1. Event: set_irq, coalesced=no. 7. HW transfers PIR into IRR. set_irq return value at 7 is incorrect, interrupt event was _not_ queued. Not sure I understand the flow of events in your description correctly. As I understand it at 4 set_irq() will return incorrect result. Basically when PIR is set to 1 while IRR has 1 for the vector the value of set_irq() will be incorrect. At 4 it has not been coalesced: it has been queued to IRR. At 6 it has been coalesced: PIR bit merged into IRR bit. Yes, that's the case. Frankly I do not see how it can be fixed without any race with present HW PIR design. At kvm_accept_apic_interrupt, check IRR before setting PIR bit, if IRR already set, don't set PIR. Need to check both IRR and PIR. Something like that: apic_accept_interrupt() { if (PIR || IRR) return coalesced; else set PIR; } This has two problems. Firs is that interrupt that can be delivered will be not (IRR is cleared just after it was tested), but it will be reported as coalesced, so this is benign race. Yes, and the same condition exists today with IRR, its fine. Second is that interrupt may be reported as delivered, but it will be coalesced (possible only with the self IPI with the same vector): Starting condition: PIR=0, IRR=0 vcpu is in a guest mode io thread | vcpu accept_apic_interrupt() | PIR and IRR is zero | set PIR | return delivered | | self IPI | set IRR | merge PIR to IRR (*) At (*) interrupt that was reported as delivered is coalesced. Only vcpu itself should send self-IPI, so its fine. Or: apic_accept_interrupt() { 1. Read ORIG_PIR=PIR, ORIG_IRR=IRR. Never set IRR when HWAPIC enabled, even if outside of guest mode. 2. Set PIR and let HW or SW VM-entry transfer it to IRR. 3. set_irq return value: (ORIG_PIR or ORIG_IRR set). } This can report interrupt as coalesced, but it will be eventually delivered as separate interrupt: Starting condition: PIR=0, IRR=1 vcpu is in a guest mode io thread | vcpu | accept_apic_interrupt() | ORIG_PIR=0, ORIG_IRR=1 | |EOI |clear IRR, set ISR set PIR | return coalesced| |clear PIR, set IRR |EOI |clear IRR, set ISR (*) At (*) interrupt that was reported as coalesced is delivered. So still no perfect solution. But first one has much less serious problems for our practical needs. Two or more concurrent set_irq can race with each other, though. Can either document the race or add a lock. -- Gleb. Ok, then: accept_apic_irq: 1. coalesced = test_and_set_bit(PIR) 2. set KVM_REQ_EVENT bit(*) 3. if (vcpu-in_guest_mode) 4. if (test_and_set_bit(pir notification bit)) 5. send PIR IPI 6. return coalesced Other sites: A: On VM-entry, after disabling interrupts, but before the last check for -requests, clear pir notification bit (unconditionally). (*) This is _necessary_ also because during VM-exit a PIR IPI interrupt can be missed, so the KVM_REQ_EVENT indicates that SW is responsible for PIR-IRR transfer. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5] target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs
On Wed, Feb 06, 2013 at 10:22:32PM +, Auld, Will wrote: Marcelo, Hi, I have been watching for this patch in the upstream but have not seen it yet. What version of QEMU should it be in? Thanks, Will Will, its in the GIT tree: https://github.com/qemu/qemu/commit/f28558d3d37ad3bc4e35e8ac93f7bf81a0d5622c As for the next release: http://www.mail-archive.com/qemu-devel@nongnu.org/msg153579.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting
On Wed, Feb 06, 2013 at 08:49:23PM -0200, Marcelo Tosatti wrote: On Tue, Feb 05, 2013 at 09:32:50AM +0200, Gleb Natapov wrote: On Mon, Feb 04, 2013 at 06:47:30PM -0200, Marcelo Tosatti wrote: On Mon, Feb 04, 2013 at 05:59:52PM -0200, Marcelo Tosatti wrote: On Mon, Feb 04, 2013 at 07:13:01PM +0200, Gleb Natapov wrote: On Mon, Feb 04, 2013 at 12:43:45PM -0200, Marcelo Tosatti wrote: Any example how software relies on such two-interrupts-queued-in-IRR/ISR behaviour? Don't know about guests, but KVM relies on it to detect interrupt coalescing. So if interrupt is set in IRR but not in PIR interrupt will not be reported as coalesced, but it will be coalesced during PIR-IRR merge. Yes, so: 1. IRR=1, ISR=0, PIR=0. Event: set_irq, coalesced=no. 2. IRR=0, ISR=1, PIR=0. Event: IRR-ISR transfer. 3. vcpu outside of guest mode. 4. IRR=1, ISR=1, PIR=0. Event: set_irq, coalesced=no. 5. vcpu enters guest mode. 6. IRR=1, ISR=1, PIR=1. Event: set_irq, coalesced=no. 7. HW transfers PIR into IRR. set_irq return value at 7 is incorrect, interrupt event was _not_ queued. Not sure I understand the flow of events in your description correctly. As I understand it at 4 set_irq() will return incorrect result. Basically when PIR is set to 1 while IRR has 1 for the vector the value of set_irq() will be incorrect. At 4 it has not been coalesced: it has been queued to IRR. At 6 it has been coalesced: PIR bit merged into IRR bit. Yes, that's the case. Frankly I do not see how it can be fixed without any race with present HW PIR design. At kvm_accept_apic_interrupt, check IRR before setting PIR bit, if IRR already set, don't set PIR. Need to check both IRR and PIR. Something like that: apic_accept_interrupt() { if (PIR || IRR) return coalesced; else set PIR; } This has two problems. Firs is that interrupt that can be delivered will be not (IRR is cleared just after it was tested), but it will be reported as coalesced, so this is benign race. Yes, and the same condition exists today with IRR, its fine. Second is that interrupt may be reported as delivered, but it will be coalesced (possible only with the self IPI with the same vector): Starting condition: PIR=0, IRR=0 vcpu is in a guest mode io thread | vcpu accept_apic_interrupt() | PIR and IRR is zero | set PIR | return delivered | | self IPI | set IRR | merge PIR to IRR (*) At (*) interrupt that was reported as delivered is coalesced. Only vcpu itself should send self-IPI, so its fine. Or: apic_accept_interrupt() { 1. Read ORIG_PIR=PIR, ORIG_IRR=IRR. Never set IRR when HWAPIC enabled, even if outside of guest mode. 2. Set PIR and let HW or SW VM-entry transfer it to IRR. 3. set_irq return value: (ORIG_PIR or ORIG_IRR set). } This can report interrupt as coalesced, but it will be eventually delivered as separate interrupt: Starting condition: PIR=0, IRR=1 vcpu is in a guest mode io thread | vcpu | accept_apic_interrupt() | ORIG_PIR=0, ORIG_IRR=1 | |EOI |clear IRR, set ISR set PIR | return coalesced| |clear PIR, set IRR |EOI |clear IRR, set ISR (*) At (*) interrupt that was reported as coalesced is delivered. So still no perfect solution. But first one has much less serious problems for our practical needs. Two or more concurrent set_irq can race with each other, though. Can either document the race or add a lock. -- Gleb. Ok, then: accept_apic_irq: 1. coalesced = test_and_set_bit(PIR) 2. set KVM_REQ_EVENT bit (*) 3. if (vcpu-in_guest_mode) 4.if (test_and_set_bit(pir notification bit)) 5.send PIR IPI 6. return coalesced Other sites: A: On VM-entry, after disabling interrupts, but before the last check for -requests, clear pir notification bit (unconditionally). (*) This is _necessary_ also because during VM-exit a PIR IPI interrupt can be missed, so the KVM_REQ_EVENT indicates that SW is responsible for PIR-IRR transfer. Its not a bad idea to have a new KVM_REQ_ bit for PIR processing (just as the current patches do). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcm_vhost: Multi-queue support
On 02/06/2013 07:59 PM, Paolo Bonzini wrote: - Messaggio originale - Da: Asias He as...@redhat.com A: Nicholas A. Bellinger n...@linux-iscsi.org Cc: Paolo Bonzini pbonz...@redhat.com, Stefan Hajnoczi stefa...@redhat.com, Michael S. Tsirkin m...@redhat.com, Rusty Russell ru...@rustcorp.com.au, kvm@vger.kernel.org, virtualizat...@lists.linux-foundation.org, target-de...@vger.kernel.org Inviato: Mercoledì, 6 febbraio 2013 10:51:34 Oggetto: Re: [PATCH] tcm_vhost: Multi-queue support On 02/06/2013 04:39 PM, Nicholas A. Bellinger wrote: On Wed, 2013-02-06 at 15:09 +0800, Asias He wrote: On 02/06/2013 02:45 PM, Nicholas A. Bellinger wrote: On Wed, 2013-02-06 at 13:20 +0800, Asias He wrote: This adds virtio-scsi multi-queue support to tcm_vhost. Guest side virtio-scsi multi-queue support can be found here: https://lkml.org/lkml/2012/12/18/166 Some initial perf numbers: 1 queue, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 127K/127k IOPS 4 queues, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 181K/181k IOPS 4 VCPUs I suppose? Yes. -- Asias -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: MMU: lazily drop large spte
On Tue, Feb 05, 2013 at 03:11:09PM +0800, Xiao Guangrong wrote: Currently, kvm zaps the large spte if write-protected is needed, the later read can fault on that spte. Actually, we can make the large spte readonly instead of making them un-present, the page fault caused by read access can be avoid The idea is from Avi: | As I mentioned before, write-protecting a large spte is a good idea, | since it moves some work from protect-time to fault-time, so it reduces | jitter. This removes the need for the return value. Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting
According the SDM, software should not touch the IRR when target vcpu is running. Instead, use locked way to access PIR. So your solution may wrong. Then your apicv patches are broken, because they do exactly that. Which code is broken? The one that updates IRR directly on the apic page. No, all the updates are ensuring the target vcpu is not running. So it's safe to touch IRR. Not at all. Read the code. Sorry. I still cannot figure out which code is wrong. All the places call sync_pir_to_irr() are on target vcpu. Can you point out the code? Thanks. I am taking about vapic patches which are already in, not pir patches. Yes, but the issue will be fixed with pir patches. With posted interrupt, it will touch PIR instead IRR and access PIR is allowed by HW. Best regards, Yang From http://www.mail-archive.com/kvm@vger.kernel.org/msg82824.html: 2. Section 29.6 mentions that Use of the posted-interrupt descriptor differs from that of other data structures that are referenced by pointers in a VMCS. There is a general requirement that software ensure that each such data structure is modified only when no logical processor with a current VMCS that references it is in VMX non-root operation. That requirement does not apply to the posted-interrupt descriptor. There is a requirement, however, that such modifications be done using locked read-modify-write instructions. The APIC virtual page is being modified by a CPU while a logical processor with current VMCS that references it is in VMX non-root operation, in fact even modifying the APIC virtual page with EOI virtualizaton, virtual interrupt delivery, etc. What are the requirements in this case? It should be same with posted interrupt. Software must ensure to use atomic access to virtual apic page. Can this point be clarified? Software can or cannot access virtual APIC page while VMCS that references it is in VMX non-root operation? Because if it cannot, then it means the current code is broken and VID usage without PIR should not be allowed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html