Re: [PATCH v2 2/5] KVM: x86: Emulator performs code segment checks on read access
Radim, Paolo, Sorry for the late responses (due to holidays)… On Oct 11, 2014, at 12:39 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 10/10/2014 17:54, Radim Krčmář ha scritto: One exception is the case of conforming code segment. The SDM says: Use a code-segment override prefix (CS) to read a readable... [it is] valid because the DPL of the code segment selected by the CS register is the same as the CPL. This is misleading since CS.DPL may be lower (numerically) than CPL, and CS would still be accessible. The emulator should avoid privilage level checks for data reads using CS. Ah, after stripping faulty presumptions, I'm not sure this change is enough ... shouldn't we also skip the check on conforming code segments? Method 2 is always valid because the privilege level of a conforming code segment is effectively the same as the CPL, regardless of its DPL. Radim is right; we need to skip the check on conforming code segments and, once we do that, checking addr.seg is not necessary anymore. That is because, for a CS override on a nonconforming code segment, at the time we fetch the instruction we know that cpl == desc.dpl. The less restrictive data segment check (cpl = desc.dpl) thus always passes. Yes. I was wrong, assuming the code-segment checks are just a derivative of the data segment checks. Let's put together this check and the readability check, too, since we are adding another if (fetch). Can you guys think of a way to simplify the following untested patch? diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 03954f7900f5..9f3e33551db9 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -638,9 +638,6 @@ static int __linearize(struct x86_emulate_ctxt *ctxt, if ctxt-mode != X86EMUL_MODE_REAL) (desc.type 8)) || !(desc.type 2)) write) goto bad; - /* unreadable code segment */ - if (!fetch (desc.type 8) !(desc.type 2)) - goto bad; lim = desc_limit_scaled(desc); if ((ctxt-mode == X86EMUL_MODE_REAL) !fetch (ctxt-d NoBigReal)) { @@ -660,17 +657,40 @@ static int __linearize(struct x86_emulate_ctxt *ctxt, goto bad; } cpl = ctxt-ops-cpl(ctxt); - if (!(desc.type 8)) { - /* data segment */ + if (fetch (desc.type 8)) { + if (!(desc.type 4)) { + /* nonconforming code segment */ + if (cpl != desc.dpl) + goto bad; + break; + } else { + /* conforming code segment */ + if (cpl desc.dpl) + goto bad; + break; + } + } + + if (likely(!(desc.type 8) || (desc.type 6) == 2)) { + /* + * Data segment or readable, nonconforming code + * segment. The SDM mentions that access through + * a code-segment override prefix is always valid. + * This really only matters for conforming code + * segments (checked below, and always valid anyway): + * for nonconforming ones, cpl == desc.dpl was checked + * when fetching the instruction, meaning the following + * test will always pass too. + */ if (cpl desc.dpl) goto bad; - } else if ((desc.type 8) !(desc.type 4)) { - /* nonconforming code segment */ - if (cpl != desc.dpl) - goto bad; - } else if ((desc.type 8) (desc.type 4)) { - /* conforming code segment */ - if (cpl desc.dpl) + } else { + /* + * These are the (rare) cases that do not behave + * like data segments: nonreadable code segments (bad) + * and readable, conforming code segments (good). + */ + if (!(desc.type 2)) goto bad; } break; Looks good. I’ll give it a try but it is hard to give a definitive answer, since the emulator is still bug-ridden. Please note I submitted another patch at this area (Wrong error code on limit violation during emulation”), so both should be merged. Thanks, Nadav signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCH] qcow2: fix double-free of Qcow2DiscardRegion in qcow2_process_discards
Am 11.10.2014 um 09:14 hat Zhang Haoyu geschrieben: In qcow2_update_snapshot_refcount - qcow2_process_discards() - bdrv_discard() may free the Qcow2DiscardRegion which is referenced by next pointer in qcow2_process_discards() now, in next iteration, d = next, so g_free(d) will double-free this Qcow2DiscardRegion. qcow2_snapshot_delete |- qcow2_update_snapshot_refcount |-- qcow2_process_discards |--- bdrv_discard | aio_poll |- aio_dispatch |-- bdrv_co_io_em_complete |--- qemu_coroutine_enter(co-coroutine, NULL); === coroutine entry is bdrv_co_do_rw |--- g_free(d) == free first Qcow2DiscardRegion is okay |--- d = next; == this set is done in QTAILQ_FOREACH_SAFE() macro. |--- g_free(d); == double-free will happen if during previous iteration, bdrv_discard had free this object. Do you have a reproducer for this or did code review lead you to this? At the moment I can't see how bdrv_discard(bs-file) could ever free a Qcow2DiscardRegion of bs, as it's working on a completely different BlockDriverState (which usually won't even be a qcow2 one). bdrv_co_do_rw |- bdrv_co_do_writev |-- bdrv_co_do_pwritev |--- bdrv_aligned_pwritev | qcow2_co_writev |- qcow2_alloc_cluster_link_l2 |-- qcow2_free_any_clusters |--- qcow2_free_clusters | update_refcount |- qcow2_process_discards |-- g_free(d) == In next iteration, this Qcow2DiscardRegion will be double-free. This shouldn't happen in a nested call either, as s-lock can't be taken recursively. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [Bug?] qemu abort when trying to passthrough BCM5719 Gigabit Ethernet
On Sat, Oct 11, 2014 at 01:41:48AM -0600, Alex Williamson wrote: On Sat, 2014-10-11 at 13:58 +0800, zhanghailiang wrote: Hi all, When i try to passthrough BCM5719 Gigabit Ethernet to guest using the qemu master branch, it aborted, and show kvm_set_phys_mem:error registering slot:Bad Address. qemu command: #./qemu/qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -vnc :99 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -drive file=/home/suse11_sp3_64,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi- id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -device pci-assign,host=01:00.1,id=mydevice -net none info about guest and host: host OS: 3.16.5 *guest OS: Novell SuSE Linux Enterprise Server 11 SP3* #cat /proc/cpuinfo processor : 31 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz stepping: 4 microcode : 0x416 cpu MHz : 1926.875 cache size : 20480 KB physical id : 1 siblings: 16 core id : 7 cpu cores : 8 apicid : 47 initial apicid : 47 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms bogomips: 4005.35 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: gdb info: (gdb) bt #0 0x71ce9989 in raise () from /usr/lib64/libc.so.6 #1 0x71ceb098 in abort () from /usr/lib64/libc.so.6 #2 0x556275cf in kvm_set_phys_mem (section=0x7fffedcea790, add=true) at /home/qemu/qemu/kvm-all.c:711 #3 0x5562980f in address_space_update_topology_pass (as=as@entry=0x55d01ca0 address_space_memory, adding=adding@entry=true, new_view=optimized out, new_view=optimized out, old_view=0x7fffe8022a90, old_view=0x7fffe8022a90) at /home/qemu/qemu/memory.c:752 #4 0x5562b910 in address_space_update_topology (as=0x55d01ca0 address_space_memory) at /home/qemu/qemu/memory.c:767 #5 memory_region_transaction_commit () at /home/qemu/qemu/memory.c:808 #6 0x557a75b4 in pci_update_mappings (d=0x562ba9f0) at hw/pci/pci.c:1113 #7 0x557a7932 in pci_default_write_config (d=d@entry=0x562ba9f0, addr=addr@entry=20, val_in=val_in@entry=4294967295, l=l@entry=4) at hw/pci/pci.c:1165 #8 0x5566c17e in assigned_dev_pci_write_config (pci_dev=0x562ba9f0, address=20, val=4294967295, len=4) at /home/qemu/qemu/hw/i386/kvm/pci-assign.c:1196 #9 0x55628fea in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fffedceaae0, size=size@entry=4, access_size_min=optimized out, access_size_max=optimized out, access=0x55629160 memory_region_write_accessor, mr=0x56231f00) at /home/qemu/qemu/memory.c:480 #10 0x5562dbf7 in memory_region_dispatch_write (size=4, data=18446744073709551615, addr=0, mr=0x56231f00) at /home/qemu/qemu/memory.c:1122 #11 io_mem_write (mr=mr@entry=0x56231f00, addr=0, val=optimized out, size=4) at /home/qemu/qemu/memory.c:1958 #12 0x555f8963 in address_space_rw (as=0x55d01d80 address_space_io, addr=addr@entry=3324, buf=0x77fec000 \377\377\377\377, len=len@entry=4, is_write=is_write@entry=true) at /home/qemu/qemu/exec.c:2145 #13 0x55628491 in kvm_handle_io (count=1, size=4, direction=optimized out, data=optimized out, port=3324) at /home/qemu/qemu/kvm-all.c:1614 #14 kvm_cpu_exec (cpu=cpu@entry=0x5620e610) at /home/qemu/qemu/kvm-all.c:1771 #15 0x55617182 in qemu_kvm_cpu_thread_fn (arg=0x5620e610) at /home/qemu/qemu/cpus.c:953 #16 0x76ba2df3 in start_thread () from /usr/lib64/libpthread.so.0 #17 0x71daa3dd in clone () from /usr/lib64/libc.so.6 messages log: Oct 10 07:43:18 localhost kernel: kvm: zapping shadow pages for mmio generation wraparound Oct 10 07:43:27 localhost kernel: kvm [13251]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd Oct 10 07:43:28 localhost kernel: intel_iommu_map: iommu width (48) is not sufficient for the mapped address (fe001000) Oct 10 07:43:28 localhost kernel: kvm_iommu_map_address:iommu failed to map pfn=94880
Re: [Qemu-devel] [PATCH] qcow2: fix double-free of Qcow2DiscardRegion in qcow2_process_discards
On 2014-10-12 15:34, Kevin Wolf wrote: Am 11.10.2014 um 09:14 hat Zhang Haoyu geschrieben: In qcow2_update_snapshot_refcount - qcow2_process_discards() - bdrv_discard() may free the Qcow2DiscardRegion which is referenced by next pointer in qcow2_process_discards() now, in next iteration, d = next, so g_free(d) will double-free this Qcow2DiscardRegion. qcow2_snapshot_delete |- qcow2_update_snapshot_refcount |-- qcow2_process_discards |--- bdrv_discard | aio_poll |- aio_dispatch |-- bdrv_co_io_em_complete |--- qemu_coroutine_enter(co-coroutine, NULL); === coroutine entry is bdrv_co_do_rw |--- g_free(d) == free first Qcow2DiscardRegion is okay |--- d = next; == this set is done in QTAILQ_FOREACH_SAFE() macro. |--- g_free(d); == double-free will happen if during previous iteration, bdrv_discard had free this object. Do you have a reproducer for this or did code review lead you to this? This problem can be reproduced with loop of savevm - delvm - savem - delvm ..., about 4 hours. When I delete the vm snapshot, qemu crashed with a core file, I debug the core file and find the double-free and the stack. So I add a breakpoint at g_free(d);, and find that indeed a double-free happened, twice free with the same address. And only the first discard region have not happened with double-free. At the moment I can't see how bdrv_discard(bs-file) could ever free a Qcow2DiscardRegion of bs, as it's working on a completely different BlockDriverState (which usually won't even be a qcow2 one). I think the aio_context in bdrv_discard - aio_poll(aio_context, true) is the qemu_aio_context, no matter the bs or bs-file passed to bdrv_discard, so aio_poll(aio_context) will poll all of the aio. bdrv_co_do_rw |- bdrv_co_do_writev |-- bdrv_co_do_pwritev |--- bdrv_aligned_pwritev | qcow2_co_writev |- qcow2_alloc_cluster_link_l2 |-- qcow2_free_any_clusters |--- qcow2_free_clusters | update_refcount |- qcow2_process_discards |-- g_free(d) == In next iteration, this Qcow2DiscardRegion will be double-free. This shouldn't happen in a nested call either, as s-lock can't be taken recursively. Could you detail how s-lock prevent that, above stack is from the gdb, when I add a breakpoint in g_free(d). Thanks, Zhang Haoyu Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: drop unsupported capabilities, fix documentation
No kernel ever reported KVM_CAP_DEVICE_MSIX, KVM_CAP_DEVICE_MSI, KVM_CAP_DEVICE_ASSIGNMENT, KVM_CAP_DEVICE_DEASSIGNMENT. This makes the documentation wrong, and no application ever written to use these capabilities has a chance to work correctly. The only way to detect support is to try, and test errno for ENOTTY. That's unfortunate, but we can't fix the past. Document the actual semantics, and drop the definitions from the exported header to make it easier for application developers to note and fix the bug. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- include/uapi/linux/kvm.h | 8 Documentation/virtual/kvm/api.txt | 40 --- 2 files changed, 33 insertions(+), 15 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index cf3a2ff..98d72f7 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -647,11 +647,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_MP_STATE 14 #define KVM_CAP_COALESCED_MMIO 15 #define KVM_CAP_SYNC_MMU 16 /* Changes to host mmap are reflected in guest */ -#define KVM_CAP_DEVICE_ASSIGNMENT 17 #define KVM_CAP_IOMMU 18 -#ifdef __KVM_HAVE_MSI -#define KVM_CAP_DEVICE_MSI 20 -#endif /* Bug in KVM_SET_USER_MEMORY_REGION fixed: */ #define KVM_CAP_DESTROY_MEMORY_REGION_WORKS 21 #ifdef __KVM_HAVE_USER_NMI @@ -665,10 +661,6 @@ struct kvm_ppc_smmu_info { #endif #define KVM_CAP_IRQ_ROUTING 25 #define KVM_CAP_IRQ_INJECT_STATUS 26 -#define KVM_CAP_DEVICE_DEASSIGNMENT 27 -#ifdef __KVM_HAVE_MSIX -#define KVM_CAP_DEVICE_MSIX 28 -#endif #define KVM_CAP_ASSIGN_DEV_IRQ 29 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */ #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30 diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index beae3fd..0fe195c 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -68,9 +68,12 @@ description: Capability: which KVM extension provides this ioctl. Can be 'basic', which means that is will be provided by any kernel that supports - API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which + API version 12 (see section 4.1), a KVM_CAP_xyz constant, which means availability needs to be checked with KVM_CHECK_EXTENSION - (see section 4.4). + (see section 4.4), or 'none' which means that while not all kernels + support this ioctl, there's no capability bit to check its + availability: for kernels that don't support the ioctl, + the ioctl returns -ENOTTY. Architectures: which instruction set architectures provide this ioctl. x86 includes both i386 and x86_64. @@ -1257,7 +1260,7 @@ The flags bitmap is defined as: 4.48 KVM_ASSIGN_PCI_DEVICE -Capability: KVM_CAP_DEVICE_ASSIGNMENT +Capability: none Architectures: x86 ia64 Type: vm ioctl Parameters: struct kvm_assigned_pci_dev (in) @@ -1298,10 +1301,16 @@ Only PCI header type 0 devices with PCI BAR resources are supported by device assignment. The user requesting this ioctl must have read/write access to the PCI sysfs resource files associated with the device. +Errors: + ENOTTY: kernel does not support this ioctl + + Other error conditions may be defined by individual device types or + have their standard meanings. + 4.49 KVM_DEASSIGN_PCI_DEVICE -Capability: KVM_CAP_DEVICE_DEASSIGNMENT +Capability: none Architectures: x86 ia64 Type: vm ioctl Parameters: struct kvm_assigned_pci_dev (in) @@ -1309,9 +1318,14 @@ Returns: 0 on success, -1 on error Ends PCI device assignment, releasing all associated resources. -See KVM_CAP_DEVICE_ASSIGNMENT for the data structure. Only assigned_dev_id is +See KVM_ASSIGN_PCI_DEVICE for the data structure. Only assigned_dev_id is used in kvm_assigned_pci_dev to identify the device. +Errors: + ENOTTY: kernel does not support this ioctl + + Other error conditions may be defined by individual device types or + have their standard meanings. 4.50 KVM_ASSIGN_DEV_IRQ @@ -1346,6 +1360,12 @@ The following flags are defined: It is not valid to specify multiple types per host or guest IRQ. However, the IRQ type of host and guest can differ or can even be null. +Errors: + ENOTTY: kernel does not support this ioctl + + Other error conditions may be defined by individual device types or + have their standard meanings. + 4.51 KVM_DEASSIGN_DEV_IRQ @@ -1423,7 +1443,7 @@ struct kvm_irq_routing_s390_adapter { 4.53 KVM_ASSIGN_SET_MSIX_NR -Capability: KVM_CAP_DEVICE_MSIX +Capability: none Architectures: x86 ia64 Type: vm ioctl Parameters: struct kvm_assigned_msix_nr (in) @@ -1445,7 +1465,7 @@ struct kvm_assigned_msix_nr { 4.54 KVM_ASSIGN_SET_MSIX_ENTRY -Capability: KVM_CAP_DEVICE_MSIX +Capability: none Architectures: x86 ia64 Type: vm ioctl Parameters: struct kvm_assigned_msix_entry (in) @@ -1461,6 +1481,12 @@ struct kvm_assigned_msix_entry { __u16 padding[3]; }; +Errors: + ENOTTY: kernel
Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors
On Sat, Oct 11, 2014 at 03:16:44PM +0800, Jason Wang wrote: Below should be useful for some experiments Jason is doing. I thought I'd send it out for early review/feedback. event idx feature allows us to defer interrupts until a specific # of descriptors were used. Sometimes it might be useful to get an interrupt after a specific descriptor, regardless. This adds a descriptor flag for this, and an API to create an urgent output descriptor. This is still an RFC: we'll need a feature bit for drivers to detect this, but we've run out of feature bits for virtio 0.X. For experimentation purposes, drivers can assume this is set, or add a driver-specific feature bit. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com I see that as compared to my original patch, you have added a new flag: VRING_AVAIL_F_NO_URGENT_INTERRUPT I don't think it's necessary, see below. As such, I think this patch should be split: - original patch adding support for urgent descriptors - a patch adding virtqueue_enable/disable_cb_urgent(_prepare)? --- drivers/virtio/virtio_ring.c | 75 +--- include/linux/virtio.h | 14 include/uapi/linux/virtio_ring.h | 5 ++- 3 files changed, 89 insertions(+), 5 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 4d08f45a..a5188c6 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -115,6 +115,7 @@ static inline struct scatterlist *sg_next_arr(struct scatterlist *sg, /* Set up an indirect table of descriptors and add it to the queue. */ static inline int vring_add_indirect(struct vring_virtqueue *vq, + bool urgent, struct scatterlist *sgs[], struct scatterlist *(*next) (struct scatterlist *, unsigned int *), @@ -173,6 +174,8 @@ static inline int vring_add_indirect(struct vring_virtqueue *vq, /* Use a single buffer which doesn't continue */ head = vq-free_head; vq-vring.desc[head].flags = VRING_DESC_F_INDIRECT; + if (urgent) + vq-vring.desc[head].flags |= VRING_DESC_F_URGENT; vq-vring.desc[head].addr = virt_to_phys(desc); /* kmemleak gives a false positive, as it's hidden by virt_to_phys */ kmemleak_ignore(desc); @@ -185,6 +188,7 @@ static inline int vring_add_indirect(struct vring_virtqueue *vq, } static inline int virtqueue_add(struct virtqueue *_vq, + bool urgent, struct scatterlist *sgs[], struct scatterlist *(*next) (struct scatterlist *, unsigned int *), @@ -227,7 +231,7 @@ static inline int virtqueue_add(struct virtqueue *_vq, /* If the host supports indirect descriptor tables, and we have multiple * buffers, then go indirect. FIXME: tune this threshold */ if (vq-indirect total_sg 1 vq-vq.num_free) { - head = vring_add_indirect(vq, sgs, next, total_sg, total_out, + head = vring_add_indirect(vq, urgent, sgs, next, total_sg, total_out, total_in, out_sgs, in_sgs, gfp); if (likely(head = 0)) @@ -256,6 +260,10 @@ static inline int virtqueue_add(struct virtqueue *_vq, for (n = 0; n out_sgs; n++) { for (sg = sgs[n]; sg; sg = next(sg, total_out)) { vq-vring.desc[i].flags = VRING_DESC_F_NEXT; + if (urgent) { + vq-vring.desc[head].flags |= VRING_DESC_F_URGENT; + urgent = false; + } vq-vring.desc[i].addr = sg_phys(sg); vq-vring.desc[i].len = sg-length; prev = i; @@ -265,6 +273,10 @@ static inline int virtqueue_add(struct virtqueue *_vq, for (; n (out_sgs + in_sgs); n++) { for (sg = sgs[n]; sg; sg = next(sg, total_in)) { vq-vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE; + if (urgent) { + vq-vring.desc[head].flags |= VRING_DESC_F_URGENT; + urgent = false; + } vq-vring.desc[i].addr = sg_phys(sg); vq-vring.desc[i].len = sg-length; prev = i; @@ -305,6 +317,8 @@ add_head: /** * virtqueue_add_sgs - expose buffers to other end + * @urgent: in case virtqueue_enable_cb_delayed was called, cause an interrupt + * after this descriptor was completed * @vq: the struct virtqueue we're talking about. * @sgs: array of terminated scatterlists. *
Re: [PATCH 3.10] vhost-net: backport extend device allocation to 3.10
On Thu, Oct 09, 2014 at 08:41:23AM +0400, Dmitry Petuhov wrote: From: Michael S. Tsirkin m...@redhat.com upstream commit 23cc5a991c7a9fb7e6d6550e65cee4f4173111c5 Michael Mueller provided a patch to reduce the size of vhost-net structure as some allocations could fail under memory pressure/fragmentation. We are still left with high order allocations though. This patch is handling the problem at the core level, allowing vhost structures to use vmalloc() if kmalloc() failed. As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT to kzalloc() flags to do this fallback only when really needed. People are still looking at cleaner ways to handle the problem at the API level, probably passing in multiple iovecs. This hack seems consistent with approaches taken since then by drivers/vhost/scsi.c and net/core/dev.c Based on patch by Romain Francoise. Cc: Michael Mueller m...@linux.vnet.ibm.com Signed-off-by: Romain Francoise rom...@orebokech.com Acked-by: Michael S. Tsirkin m...@redhat.com [mityapetuhov: backport to v3.10: vhost_net_free() in one more place] Signed-off-by: Dmitry Petuhov mityapetu...@gmail.com Sounds reasonable. Acked-by: Michael S. Tsirkin m...@redhat.com --- diff -uprN a/drivers/vhost/net.c b/drivers/vhost/net.c --- a/drivers/vhost/net.c 2014-10-09 06:45:08.336283258 +0400 +++ b/drivers/vhost/net.c 2014-10-09 06:51:21.796266607 +0400 @@ -18,6 +18,7 @@ #include linux/rcupdate.h #include linux/file.h #include linux/slab.h +#include linux/vmalloc.h #include linux/net.h #include linux/if_packet.h @@ -707,18 +708,30 @@ static void handle_rx_net(struct vhost_w handle_rx(net); } +static void vhost_net_free(void *addr) +{ + if (is_vmalloc_addr(addr)) + vfree(addr); + else + kfree(addr); +} + static int vhost_net_open(struct inode *inode, struct file *f) { - struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL); + struct vhost_net *n; struct vhost_dev *dev; struct vhost_virtqueue **vqs; int r, i; - if (!n) - return -ENOMEM; + n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); + if (!n) { + n = vmalloc(sizeof *n); + if (!n) + return -ENOMEM; + } vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL); if (!vqs) { - kfree(n); + vhost_net_free(n); return -ENOMEM; } @@ -737,7 +750,7 @@ static int vhost_net_open(struct inode * } r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); if (r 0) { - kfree(n); + vhost_net_free(n); kfree(vqs); return r; } @@ -840,7 +853,7 @@ static int vhost_net_release(struct inod * since jobs can re-queue themselves. */ vhost_net_flush(n); kfree(n-dev.vqs); - kfree(n); + vhost_net_free(n); return 0; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 09/25] virtio_net: minor cleanup
goto done; done: return; is ugly, it was put there to make diff review easier. replace by open-coded return. Signed-off-by: Michael S. Tsirkin m...@redhat.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/net/virtio_net.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 23e4a69..ef04d23 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1403,7 +1403,7 @@ static void virtnet_config_changed_work(struct work_struct *work) if (virtio_cread_feature(vi-vdev, VIRTIO_NET_F_STATUS, struct virtio_net_config, status, v) 0) - goto done; + return; if (v VIRTIO_NET_S_ANNOUNCE) { netdev_notify_peers(vi-dev); @@ -1414,7 +1414,7 @@ static void virtnet_config_changed_work(struct work_struct *work) v = VIRTIO_NET_S_LINK_UP; if (vi-status == v) - goto done; + return; vi-status = v; @@ -1425,8 +1425,6 @@ static void virtnet_config_changed_work(struct work_struct *work) netif_carrier_off(vi-dev); netif_tx_stop_all_queues(vi-dev); } -done: - return; } static void virtnet_config_changed(struct virtio_device *vdev) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 24/25] virtio_scsi: drop scan callback
Enable VQs early like we do for restore. This makes it possible to drop the scan callback, moving scanning into the probe function, and making code simpler. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/scsi/virtio_scsi.c | 23 +++ 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 327eba0..5f022ff 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -860,17 +860,6 @@ static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq, virtscsi_vq-vq = vq; } -static void virtscsi_scan(struct virtio_device *vdev) -{ - struct Scsi_Host *shost = virtio_scsi_host(vdev); - struct virtio_scsi *vscsi = shost_priv(shost); - - if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) - virtscsi_kick_event_all(vscsi); - - scsi_scan_host(shost); -} - static void virtscsi_remove_vqs(struct virtio_device *vdev) { struct Scsi_Host *sh = virtio_scsi_host(vdev); @@ -1007,10 +996,13 @@ static int virtscsi_probe(struct virtio_device *vdev) err = scsi_add_host(shost, vdev-dev); if (err) goto scsi_add_host_failed; - /* -* scsi_scan_host() happens in virtscsi_scan() via virtio_driver-scan() -* after VIRTIO_CONFIG_S_DRIVER_OK has been set.. -*/ + + virtio_enable_vqs_early(vdev); + + if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) + virtscsi_kick_event_all(vscsi); + + scsi_scan_host(shost); return 0; scsi_add_host_failed: @@ -1090,7 +1082,6 @@ static struct virtio_driver virtio_scsi_driver = { .driver.owner = THIS_MODULE, .id_table = id_table, .probe = virtscsi_probe, - .scan = virtscsi_scan, #ifdef CONFIG_PM_SLEEP .freeze = virtscsi_freeze, .restore = virtscsi_restore, -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 25/25] virtio-rng: refactor probe error handling
Code like vi-vq = NULL; kfree(vi) does not make sense. Clean it up, use goto error labels for cleanup. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/char/hw_random/virtio-rng.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c index 132c9cc..72295ea 100644 --- a/drivers/char/hw_random/virtio-rng.c +++ b/drivers/char/hw_random/virtio-rng.c @@ -109,8 +109,8 @@ static int probe_common(struct virtio_device *vdev) vi-index = index = ida_simple_get(rng_index_ida, 0, 0, GFP_KERNEL); if (index 0) { - kfree(vi); - return index; + err = index; + goto err_ida; } sprintf(vi-name, virtio_rng.%d, index); init_completion(vi-have_data); @@ -128,13 +128,16 @@ static int probe_common(struct virtio_device *vdev) vi-vq = virtio_find_single_vq(vdev, random_recv_done, input); if (IS_ERR(vi-vq)) { err = PTR_ERR(vi-vq); - vi-vq = NULL; - kfree(vi); - ida_simple_remove(rng_index_ida, index); - return err; + goto err_find; } return 0; + +err_find: + ida_simple_remove(rng_index_ida, index); +err_ida: + kfree(vi); + return err; } static void remove_common(struct virtio_device *vdev) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 23/25] virtio_balloon: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after resume returns, virtio balloon violated this rule by adding bufs, which causes the VQ to be used directly within restore. To fix, call virtio_enable_vqs_early before using VQ. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_balloon.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 25ebe8e..9629fad 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -538,6 +538,8 @@ static int virtballoon_restore(struct virtio_device *vdev) if (ret) return ret; + virtio_enable_vqs_early(vdev); + fill_balloon(vb, towards_target(vb)); update_balloon_size(vb); return 0; -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 22/25] virtio_scsi: fix race on device removal
We cancel event work on device removal, but an interrupt could trigger immediately after this, and queue it again. To fix, set a flag. Loosely based on patch by Paolo Bonzini Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/scsi/virtio_scsi.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 501838d..327eba0 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -110,6 +110,9 @@ struct virtio_scsi { /* CPU hotplug notifier */ struct notifier_block nb; + /* Protected by event_vq lock */ + bool stop_events; + struct virtio_scsi_vq ctrl_vq; struct virtio_scsi_vq event_vq; struct virtio_scsi_vq req_vqs[]; @@ -303,6 +306,11 @@ static void virtscsi_cancel_event_work(struct virtio_scsi *vscsi) { int i; + /* Stop scheduling work before calling cancel_work_sync. */ + spin_lock_irq(vscsi-event_vq.vq_lock); + vscsi-stop_events = true; + spin_unlock_irq(vscsi-event_vq.vq_lock); + for (i = 0; i VIRTIO_SCSI_EVENT_LEN; i++) cancel_work_sync(vscsi-event_list[i].work); } @@ -390,7 +398,8 @@ static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_event_node *event_node = buf; - queue_work(system_freezable_wq, event_node-work); + if (!vscsi-stop_events) + queue_work(system_freezable_wq, event_node-work); } static void virtscsi_event_done(struct virtqueue *vq) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 19/25] virtio_console: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after resume returns, virtio console violated this rule by adding inbufs, which causes the VQ to be used directly within restore. To fix, call virtio_enable_vqs_early before using VQs. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/char/virtio_console.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index 6ebe8f6..2ae843f 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -2184,6 +2184,8 @@ static int virtcons_restore(struct virtio_device *vdev) if (ret) return ret; + virtio_enable_vqs_early(portdev-vdev); + if (use_multiport(portdev)) fill_queue(portdev-c_ivq, portdev-c_ivq_lock); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 20/25] virtio_net: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio net violated this rule by using receive VQs within restore. To fix, call virtio_enable_vqs_early before using VQs. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/net/virtio_net.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3551417..6b6e136 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1912,6 +1912,8 @@ static int virtnet_restore(struct virtio_device *vdev) if (err) return err; + virtio_enable_vqs_early(vdev); + if (netif_running(vi-dev)) { for (i = 0; i vi-curr_queue_pairs; i++) if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 21/25] virito_scsi: use freezable WQ for events
From: Paolo Bonzini pbonz...@redhat.com Michael S. Tsirkin noticed a race condition: we reset device on freeze, but system WQ is still running so it might try adding bufs to a VQ meanwhile. To fix, switch to handling events from the freezable WQ. Reported-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/scsi/virtio_scsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 2b565b3..501838d 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -390,7 +390,7 @@ static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_event_node *event_node = buf; - schedule_work(event_node-work); + queue_work(system_freezable_wq, event_node-work); } static void virtscsi_event_done(struct virtqueue *vq) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 15/25] virtio_net: fix use after free on allocation failure
In the extremely unlikely event that driver initialization fails after RX buffers are added, virtio net frees RX buffers while VQs are still active, potentially causing device to use a freed buffer. To fix, reset device first - same as we do on device removal. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/net/virtio_net.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 430f3ae..3551417 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1830,6 +1830,8 @@ static int virtnet_probe(struct virtio_device *vdev) return 0; free_recv_bufs: + vi-vdev-config-reset(vdev); + free_receive_bufs(vi); unregister_netdev(dev); free_vqs: -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 14/25] 9p/trans_virtio: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, but virtio 9p device adds self to channel list within probe, at which point VQ can be used in violation of the spec. To fix, call virtio_enable_vqs_early before using VQs. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- net/9p/trans_virtio.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index 6940d8f..766ba48 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -575,6 +575,8 @@ static int p9_virtio_probe(struct virtio_device *vdev) /* Ceiling limit to avoid denial of service attacks */ chan-p9_max_pages = nr_free_buffer_pages()/4; + virtio_enable_vqs_early(vdev); + mutex_lock(virtio_9p_lock); list_add_tail(chan-chan_list, virtio_chan_list); mutex_unlock(virtio_9p_lock); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 17/25] virtio_blk: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio block violated this rule on restore by restarting queues, which might in theory cause the VQ to be used directly within restore. To fix, call virtio_enable_vqs_early before using starting queues. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/block/virtio_blk.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 46b04bf..1c95af5 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -804,10 +804,13 @@ static int virtblk_restore(struct virtio_device *vdev) int ret; ret = init_vq(vdev-priv); - if (!ret) - blk_mq_start_stopped_hw_queues(vblk-disk-queue, true); + if (ret) + return ret; + + virtio_enable_vqs_early(vdev); - return ret; + blk_mq_start_stopped_hw_queues(vblk-disk-queue, true); + return 0; } #endif -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 13/25] virtio_console: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio console violated this rule by adding inbufs, which causes the VQ to be used directly within probe. To fix, call virtio_enable_vqs_early before using VQs. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/char/virtio_console.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index b585b47..6ebe8f6 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -1449,6 +1449,8 @@ static int add_port(struct ports_device *portdev, u32 id) spin_lock_init(port-outvq_lock); init_waitqueue_head(port-waitqueue); + virtio_enable_vqs_early(portdev-vdev); + /* Fill the in_vq with buffers so the host can send us data. */ nr_added_bufs = fill_queue(port-in_vq, port-inbuf_lock); if (!nr_added_bufs) { -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 18/25] virtio_scsi: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio scsi violated this rule on restore by kicking event vq within restore. To fix, call virtio_enable_vqs_early before using event queue. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/scsi/virtio_scsi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 0642ce3..2b565b3 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -1054,6 +1054,8 @@ static int virtscsi_restore(struct virtio_device *vdev) return err; } + virtio_enable_vqs_early(vdev); + if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) virtscsi_kick_event_all(vscsi); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 12/25] virtio_blk: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio block violated this rule by calling add_disk, which causes the VQ to be used directly within probe. To fix, call virtio_enable_vqs_early before using VQs. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/block/virtio_blk.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 89ba8d6..46b04bf 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -719,6 +719,8 @@ static int virtblk_probe(struct virtio_device *vdev) if (!err opt_io_size) blk_queue_io_opt(q, blk_size * opt_io_size); + virtio_enable_vqs_early(vdev); + add_disk(vblk-disk); err = device_create_file(disk_to_dev(vblk-disk), dev_attr_serial); if (err) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 11/25] virtio_net: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio net violated this rule by using receive VQs within probe. To fix, call virtio_enable_vqs_early before using VQs. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/net/virtio_net.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ef04d23..430f3ae 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1792,6 +1792,8 @@ static int virtnet_probe(struct virtio_device *vdev) goto free_vqs; } + virtio_enable_vqs_early(vdev); + /* Last of all, set up some receive buffers. */ for (i = 0; i vi-curr_queue_pairs; i++) { try_fill_recv(vi-rq[i], GFP_KERNEL); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 10/25] virtio: add API to enable VQs early
virtio spec 0.9.X requires DRIVER_OK to be set before VQs are used, but some drivers use VQs before probe function returns. Since DRIVER_OK is set after probe, this violates the spec. Even though under virtio 1.0 transitional devices support this behaviour, we want to make it possible for those early callers to become spec compliant and eventually support non-transitional devices. Add API for drivers to call before using VQs. Sets DRIVER_OK internally. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- include/linux/virtio_config.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index e8f8f71..e36403b 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -109,6 +109,23 @@ struct virtqueue *virtio_find_single_vq(struct virtio_device *vdev, return vq; } +/** + * virtio_enable_vqs_early - enable vq use in probe function + * @vdev: the device + * + * Driver must call this to use vqs in the probe function. + * + * Note: vqs are enabled automatically after probe returns. + */ +static inline +void virtio_enable_vqs_early(struct virtio_device *dev) +{ + unsigned status = dev-config-get_status(dev); + + BUG_ON(status VIRTIO_CONFIG_S_DRIVER_OK); + dev-config-set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK); +} + static inline const char *virtio_bus_name(struct virtio_device *vdev) { -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 06/25] virtio-blk: drop config_mutex
config_mutex served two purposes: prevent multiple concurrent config change handlers, and synchronize access to config_enable flag. Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1 workqueue: make all workqueues non-reentrant all workqueues are non-reentrant, and config_enable is now gone. Get rid of the unnecessary lock. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/block/virtio_blk.c | 8 1 file changed, 8 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 91272f1a..89ba8d6 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -41,9 +41,6 @@ struct virtio_blk /* Process context for config space updates */ struct work_struct config_work; - /* Lock for config space updates */ - struct mutex config_lock; - /* What host tells us, plus 2 for header tailer. */ unsigned int sg_elems; @@ -344,8 +341,6 @@ static void virtblk_config_changed_work(struct work_struct *work) char *envp[] = { RESIZE=1, NULL }; u64 capacity, size; - mutex_lock(vblk-config_lock); - /* Host must always specify the capacity. */ virtio_cread(vdev, struct virtio_blk_config, capacity, capacity); @@ -369,8 +364,6 @@ static void virtblk_config_changed_work(struct work_struct *work) set_capacity(vblk-disk, capacity); revalidate_disk(vblk-disk); kobject_uevent_env(disk_to_dev(vblk-disk)-kobj, KOBJ_CHANGE, envp); - - mutex_unlock(vblk-config_lock); } static void virtblk_config_changed(struct virtio_device *vdev) @@ -601,7 +594,6 @@ static int virtblk_probe(struct virtio_device *vdev) vblk-vdev = vdev; vblk-sg_elems = sg_elems; - mutex_init(vblk-config_lock); INIT_WORK(vblk-config_work, virtblk_config_changed_work); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 03/25] virtio-pci: move freeze/restore to virtio core
This is in preparation to extending config changed event handling in core. Wrapping these in an API also seems to make for a cleaner code. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- include/linux/virtio.h | 6 + drivers/virtio/virtio.c | 54 + drivers/virtio/virtio_pci.c | 54 ++--- 3 files changed, 62 insertions(+), 52 deletions(-) diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 3c19bd3..8df7ba8 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -78,6 +78,7 @@ bool virtqueue_is_broken(struct virtqueue *vq); /** * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus + * @failed: saved value for CONFIG_S_FAILED bit (for restore) * @dev: underlying device. * @id: the device type identification (used to match it with a driver). * @config: the configuration ops for this device. @@ -88,6 +89,7 @@ bool virtqueue_is_broken(struct virtqueue *vq); */ struct virtio_device { int index; + bool failed; struct device dev; struct virtio_device_id id; const struct virtio_config_ops *config; @@ -109,6 +111,10 @@ void unregister_virtio_device(struct virtio_device *dev); void virtio_break_device(struct virtio_device *dev); void virtio_config_changed(struct virtio_device *dev); +#ifdef CONFIG_PM_SLEEP +int virtio_device_freeze(struct virtio_device *dev); +int virtio_device_restore(struct virtio_device *dev); +#endif /** * virtio_driver - operations for a virtio I/O driver diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 3980687..8216b73 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -248,6 +248,60 @@ void virtio_config_changed(struct virtio_device *dev) } EXPORT_SYMBOL_GPL(virtio_config_changed); +#ifdef CONFIG_PM_SLEEP +int virtio_device_freeze(struct virtio_device *dev) +{ + struct virtio_driver *drv = drv_to_virtio(dev-dev.driver); + + dev-failed = dev-config-get_status(dev) VIRTIO_CONFIG_S_FAILED; + + if (drv drv-freeze) + return drv-freeze(dev); + + return 0; +} +EXPORT_SYMBOL_GPL(virtio_device_freeze); + +int virtio_device_restore(struct virtio_device *dev) +{ + struct virtio_driver *drv = drv_to_virtio(dev-dev.driver); + + /* We always start by resetting the device, in case a previous +* driver messed it up. */ + dev-config-reset(dev); + + /* Acknowledge that we've seen the device. */ + add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE); + + /* Maybe driver failed before freeze. +* Restore the failed status, for debugging. */ + if (dev-failed) + add_status(dev, VIRTIO_CONFIG_S_FAILED); + + if (!drv) + return 0; + + /* We have a driver! */ + add_status(dev, VIRTIO_CONFIG_S_DRIVER); + + dev-config-finalize_features(dev); + + if (drv-restore) { + int ret = drv-restore(dev); + if (ret) { + add_status(dev, VIRTIO_CONFIG_S_FAILED); + return ret; + } + } + + /* Finally, tell the device we're all set */ + add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); + + return 0; +} +EXPORT_SYMBOL_GPL(virtio_device_restore); +#endif + static int virtio_init(void) { if (bus_register(virtio_bus) != 0) diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index f39f4e7..d34ebfa 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -57,9 +57,6 @@ struct virtio_pci_device /* Vectors allocated, excluding per-vq vectors if any */ unsigned msix_used_vectors; - /* Status saved during hibernate/restore */ - u8 saved_status; - /* Whether we have vector per vq */ bool per_vq_vectors; }; @@ -764,16 +761,9 @@ static int virtio_pci_freeze(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); - struct virtio_driver *drv; int ret; - drv = container_of(vp_dev-vdev.dev.driver, - struct virtio_driver, driver); - - ret = 0; - vp_dev-saved_status = vp_get_status(vp_dev-vdev); - if (drv drv-freeze) - ret = drv-freeze(vp_dev-vdev); + ret = virtio_device_freeze(vp_dev-vdev); if (!ret) pci_disable_device(pci_dev); @@ -784,54 +774,14 @@ static int virtio_pci_restore(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); - struct virtio_driver *drv; - unsigned status = 0; int ret; - drv = container_of(vp_dev-vdev.dev.driver, - struct
[PATCH v3 04/25] virtio: defer config changed notifications
Defer config changed notifications that arrive during probe/scan/freeze/restore. This will allow drivers to set DRIVER_OK earlier, without worrying about racing with config change interrupts. This change will also benefit old hypervisors (before 2009) that send interrupts without checking DRIVER_OK: previously, the callback could race with driver-specific initialization. This will also help simplify drivers. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- include/linux/virtio.h | 6 ++ drivers/virtio/virtio.c | 57 + 2 files changed, 54 insertions(+), 9 deletions(-) diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 8df7ba8..5636b11 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -79,6 +79,9 @@ bool virtqueue_is_broken(struct virtqueue *vq); * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus * @failed: saved value for CONFIG_S_FAILED bit (for restore) + * @config_enabled: configuration change reporting enabled + * @config_changed: configuration change reported while disabled + * @config_lock: protects configuration change reporting * @dev: underlying device. * @id: the device type identification (used to match it with a driver). * @config: the configuration ops for this device. @@ -90,6 +93,9 @@ bool virtqueue_is_broken(struct virtqueue *vq); struct virtio_device { int index; bool failed; + bool config_enabled; + bool config_changed; + spinlock_t config_lock; struct device dev; struct virtio_device_id id; const struct virtio_config_ops *config; diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 8216b73..2536701 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -117,6 +117,42 @@ void virtio_check_driver_offered_feature(const struct virtio_device *vdev, } EXPORT_SYMBOL_GPL(virtio_check_driver_offered_feature); +static void __virtio_config_changed(struct virtio_device *dev) +{ + struct virtio_driver *drv = drv_to_virtio(dev-dev.driver); + + if (!dev-config_enabled) + dev-config_changed = true; + else if (drv drv-config_changed) + drv-config_changed(dev); +} + +void virtio_config_changed(struct virtio_device *dev) +{ + unsigned long flags; + + spin_lock_irqsave(dev-config_lock, flags); + __virtio_config_changed(dev); + spin_unlock_irqrestore(dev-config_lock, flags); +} +EXPORT_SYMBOL_GPL(virtio_config_changed); + +static void virtio_config_disable(struct virtio_device *dev) +{ + spin_lock_irq(dev-config_lock); + dev-config_enabled = false; + spin_unlock_irq(dev-config_lock); +} + +static void virtio_config_enable(struct virtio_device *dev) +{ + spin_lock_irq(dev-config_lock); + dev-config_enabled = true; + __virtio_config_changed(dev); + dev-config_changed = false; + spin_unlock_irq(dev-config_lock); +} + static int virtio_dev_probe(struct device *_d) { int err, i; @@ -153,6 +189,8 @@ static int virtio_dev_probe(struct device *_d) add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); if (drv-scan) drv-scan(dev); + + virtio_config_enable(dev); } return err; @@ -163,6 +201,8 @@ static int virtio_dev_remove(struct device *_d) struct virtio_device *dev = dev_to_virtio(_d); struct virtio_driver *drv = drv_to_virtio(dev-dev.driver); + virtio_config_disable(dev); + drv-remove(dev); /* Driver should have reset device. */ @@ -211,6 +251,10 @@ int register_virtio_device(struct virtio_device *dev) dev-index = err; dev_set_name(dev-dev, virtio%u, dev-index); + spin_lock_init(dev-config_lock); + dev-config_enabled = false; + dev-config_changed = false; + /* We always start by resetting the device, in case a previous * driver messed it up. This also tests that code path a little. */ dev-config-reset(dev); @@ -239,20 +283,13 @@ void unregister_virtio_device(struct virtio_device *dev) } EXPORT_SYMBOL_GPL(unregister_virtio_device); -void virtio_config_changed(struct virtio_device *dev) -{ - struct virtio_driver *drv = drv_to_virtio(dev-dev.driver); - - if (drv drv-config_changed) - drv-config_changed(dev); -} -EXPORT_SYMBOL_GPL(virtio_config_changed); - #ifdef CONFIG_PM_SLEEP int virtio_device_freeze(struct virtio_device *dev) { struct virtio_driver *drv = drv_to_virtio(dev-dev.driver); + virtio_config_disable(dev); + dev-failed = dev-config-get_status(dev) VIRTIO_CONFIG_S_FAILED; if (drv drv-freeze) @@ -297,6 +334,8 @@ int virtio_device_restore(struct virtio_device *dev) /* Finally, tell the device we're all set */
[PATCH v3 08/25] virtio-net: drop config_mutex
config_mutex served two purposes: prevent multiple concurrent config change handlers, and synchronize access to config_enable flag. Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1 workqueue: make all workqueues non-reentrant all workqueues are non-reentrant, and config_enable is now gone. Get rid of the unnecessary lock. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/net/virtio_net.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 743fb04..23e4a69 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -132,9 +132,6 @@ struct virtnet_info { /* Work struct for config space updates */ struct work_struct config_work; - /* Lock for config space updates */ - struct mutex config_lock; - /* Does the affinity hint is set for virtqueues? */ bool affinity_hint_set; @@ -1404,7 +1401,6 @@ static void virtnet_config_changed_work(struct work_struct *work) container_of(work, struct virtnet_info, config_work); u16 v; - mutex_lock(vi-config_lock); if (virtio_cread_feature(vi-vdev, VIRTIO_NET_F_STATUS, struct virtio_net_config, status, v) 0) goto done; @@ -1430,7 +1426,7 @@ static void virtnet_config_changed_work(struct work_struct *work) netif_tx_stop_all_queues(vi-dev); } done: - mutex_unlock(vi-config_lock); + return; } static void virtnet_config_changed(struct virtio_device *vdev) @@ -1751,7 +1747,6 @@ static int virtnet_probe(struct virtio_device *vdev) u64_stats_init(virtnet_stats-rx_syncp); } - mutex_init(vi-config_lock); INIT_WORK(vi-config_work, virtnet_config_changed_work); /* If we can receive ANY GSO packets, we must allocate large ones. */ -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 05/25] virtio_blk: drop config_enable
Now that virtio core ensures config changes don't arrive during probing, drop config_enable flag in virtio blk. On removal, flush is now sufficient to guarantee that no change work is queued. This help simplify the driver, and will allow setting DRIVER_OK earlier without losing config change notifications. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/block/virtio_blk.c | 23 --- 1 file changed, 4 insertions(+), 19 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 0a58140..91272f1a 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -44,9 +44,6 @@ struct virtio_blk /* Lock for config space updates */ struct mutex config_lock; - /* enable config space updates */ - bool config_enable; - /* What host tells us, plus 2 for header tailer. */ unsigned int sg_elems; @@ -348,8 +345,6 @@ static void virtblk_config_changed_work(struct work_struct *work) u64 capacity, size; mutex_lock(vblk-config_lock); - if (!vblk-config_enable) - goto done; /* Host must always specify the capacity. */ virtio_cread(vdev, struct virtio_blk_config, capacity, capacity); @@ -374,7 +369,7 @@ static void virtblk_config_changed_work(struct work_struct *work) set_capacity(vblk-disk, capacity); revalidate_disk(vblk-disk); kobject_uevent_env(disk_to_dev(vblk-disk)-kobj, KOBJ_CHANGE, envp); -done: + mutex_unlock(vblk-config_lock); } @@ -609,7 +604,6 @@ static int virtblk_probe(struct virtio_device *vdev) mutex_init(vblk-config_lock); INIT_WORK(vblk-config_work, virtblk_config_changed_work); - vblk-config_enable = true; err = init_vq(vblk); if (err) @@ -771,10 +765,8 @@ static void virtblk_remove(struct virtio_device *vdev) int index = vblk-index; int refc; - /* Prevent config work handler from accessing the device. */ - mutex_lock(vblk-config_lock); - vblk-config_enable = false; - mutex_unlock(vblk-config_lock); + /* Make sure no work handler is accessing the device. */ + flush_work(vblk-config_work); del_gendisk(vblk-disk); blk_cleanup_queue(vblk-disk-queue); @@ -784,8 +776,6 @@ static void virtblk_remove(struct virtio_device *vdev) /* Stop all the virtqueues. */ vdev-config-reset(vdev); - flush_work(vblk-config_work); - refc = atomic_read(disk_to_dev(vblk-disk)-kobj.kref.refcount); put_disk(vblk-disk); vdev-config-del_vqs(vdev); @@ -805,11 +795,7 @@ static int virtblk_freeze(struct virtio_device *vdev) /* Ensure we don't receive any more interrupts */ vdev-config-reset(vdev); - /* Prevent config work handler from accessing the device. */ - mutex_lock(vblk-config_lock); - vblk-config_enable = false; - mutex_unlock(vblk-config_lock); - + /* Make sure no work handler is accessing the device. */ flush_work(vblk-config_work); blk_mq_stop_hw_queues(vblk-disk-queue); @@ -823,7 +809,6 @@ static int virtblk_restore(struct virtio_device *vdev) struct virtio_blk *vblk = vdev-priv; int ret; - vblk-config_enable = true; ret = init_vq(vdev-priv); if (!ret) blk_mq_start_stopped_hw_queues(vblk-disk-queue, true); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 01/25] virtio_pci: fix virtio spec compliance on restore
On restore, virtio pci does the following: + set features + init vqs etc - device can be used at this point! + set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits This is in violation of the virtio spec, which requires the following order: - ACKNOWLEDGE - DRIVER - init vqs - DRIVER_OK This behaviour will break with hypervisors that assume spec compliant behaviour. It seems like a good idea to have this patch applied to stable branches to reduce the support butden for the hypervisors. Cc: sta...@vger.kernel.org Cc: Amit Shah amit.s...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_pci.c | 33 ++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 3d1463c..add40d0 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -789,6 +789,7 @@ static int virtio_pci_restore(struct device *dev) struct pci_dev *pci_dev = to_pci_dev(dev); struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); struct virtio_driver *drv; + unsigned status = 0; int ret; drv = container_of(vp_dev-vdev.dev.driver, @@ -799,14 +800,40 @@ static int virtio_pci_restore(struct device *dev) return ret; pci_set_master(pci_dev); + /* We always start by resetting the device, in case a previous +* driver messed it up. */ + vp_reset(vp_dev-vdev); + + /* Acknowledge that we've seen the device. */ + status |= VIRTIO_CONFIG_S_ACKNOWLEDGE; + vp_set_status(vp_dev-vdev, status); + + /* Maybe driver failed before freeze. +* Restore the failed status, for debugging. */ + status |= vp_dev-saved_status VIRTIO_CONFIG_S_FAILED; + vp_set_status(vp_dev-vdev, status); + + if (!drv) + return 0; + + /* We have a driver! */ + status |= VIRTIO_CONFIG_S_DRIVER; + vp_set_status(vp_dev-vdev, status); + vp_finalize_features(vp_dev-vdev); - if (drv drv-restore) + if (drv-restore) { ret = drv-restore(vp_dev-vdev); + if (ret) { + status |= VIRTIO_CONFIG_S_FAILED; + vp_set_status(vp_dev-vdev, status); + return ret; + } + } /* Finally, tell the device we're all set */ - if (!ret) - vp_set_status(vp_dev-vdev, vp_dev-saved_status); + status |= VIRTIO_CONFIG_S_DRIVER_OK; + vp_set_status(vp_dev-vdev, status); return ret; } -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/5] KVM: x86: Emulator performs code segment checks on read access
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Il 12/10/2014 08:57, Nadav Amit ha scritto: Looks good. I’ll give it a try but it is hard to give a definitive answer, since the emulator is still bug-ridden. Yes, we need to write unit tests for this, especially the conforming case. A bit of a pain to get kvm-unit-tests in ring 3 (access.flat does it), but I'll give it a shot. Paolo -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQEcBAEBAgAGBQJUOnBCAAoJEBRUblpOawnXLHEH/3zlIsVFow9IWrZsxaZopWy5 DKncQriHnhsyc6W2U9oFvNgB/7+dTWscdR58jnKLr/Qt64DGH01pq4MvisdhV+31 53pv+CycUgs85EoZkA7MzArbT1Tb/gd/KB8QoqdKIC4+bNEd6JMydcsq5d6nMgOd yWLYjcYzWzzmJSNCC7UYOtN4SB4brC5RyITLq+CgT4ufSPHtBYxGd8fSHpzvJzTU T1hqelYcLRFGzoPR4ux4SP8EgXle+sslFV4KAyXkucLIafmeekcmR/AO8hy84TXj Kcpt6Y3hsY8pWXM3YB4LF/7+NuaPu/Ud+2VkbfuFhq5gUtVLwjtWtA32IlYdFEc= =BhUU -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Use VM as main machine
I would like to install a simple system on the hardware level, and do most of my work in one of the VM's. In the past I have accessed VM's graphically using vnc, but if I did that my base system (dom0 in Xen, I think) would need to run X server, which would complicate it and render the whole setup vulnerable if there was a problem with X. Also, I've had trouble in the past using evolution remotely. Is there a way to hook a VM more directly to the display, keyboard and mouse? http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM seemed potentially relevant, but tracing through the references I didn't find FLReset+ on any PCI devices other than audio, so that seems to be out (although the reference page http://wiki.xen.org/wiki/Xen_PCI_Passthrough#How_can_I_check_if_PCI_device_supports_FLR_.28Function_Level_Reset.29_.3F says that it is out of date). My immediate interest is a system that's about 5 years old, running Xeon E5420's with VT-x. It has a VGA compatible controller: Advanced Micro Devices [AMD] nee ATI RV570 [Radeon X1950 Pro] (prog-if 00 [VGA controller]) I'm also considering doing the same on a more recent system with Xeon E3-1245 v 2. It has VT-x and Vt-d, with Intel® HD Graphics P4000. I might get an external video card later. It also doesn't show FLReset+ except for audio. Thanks. Ross Boylan-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Use VM as main machine
On Sun, 2014-10-12 at 18:21 +, Boylan, Ross wrote: I would like to install a simple system on the hardware level, and do most of my work in one of the VM's. In the past I have accessed VM's graphically using vnc, but if I did that my base system (dom0 in Xen, I think) would need to run X server, which would complicate it and render the whole setup vulnerable if there was a problem with X. Also, I've had trouble in the past using evolution remotely. Is there a way to hook a VM more directly to the display, keyboard and mouse? http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM seemed potentially relevant, but tracing through the references I didn't find FLReset+ on any PCI devices other than audio, so that seems to be out (although the reference page http://wiki.xen.org/wiki/Xen_PCI_Passthrough#How_can_I_check_if_PCI_device_supports_FLR_.28Function_Level_Reset.29_.3F says that it is out of date). My immediate interest is a system that's about 5 years old, running Xeon E5420's with VT-x. It has a VGA compatible controller: Advanced Micro Devices [AMD] nee ATI RV570 [Radeon X1950 Pro] (prog-if 00 [VGA controller]) I'm also considering doing the same on a more recent system with Xeon E3-1245 v 2. It has VT-x and Vt-d, with Intel® HD Graphics P4000. I might get an external video card later. It also doesn't show FLReset+ except for audio. FLR is irrelevant, no video devices have it afaict. Neither of your video cards work with current VGA/OVMF device assignment either, IGD may work in the relative short term, Radeon X1xxx devices are too old to bother with. See this thread: https://bbs.archlinux.org/viewtopic.php?id=162768 and this blog: http://vfio.blogspot.com/ For an idea of what's currently possible. Be aware that all of this is bleeding edge and chances of needing to build your own kernel/QEMU are high. I'd recommend something like an HD5450 as the the minimum Radeon class card and an 8-series Nvidia as the minimum on that end. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Use VM as main machine
Thanks for the info, and thanks also for working on the software. It sounds as if the short answer is I can't do what I want with my current hardware; am I understanding correcty? And to get it to work I need two video cards? Do they both need to support the newer capabilities, or is it enough that one of them does? It looks as if the archlinux blog describes binding the input devices as well as the video to the VM. If you need to access the host system, how do you do it? shutdown the VM? ssh to the host from the VM? That seems to raise another challenge: to bring up the guests I first need to decrypt a device in the host. So it would be helpful to see the messages from the host and essential to type in a pass-phrase for the host. But it looks as one must specify options for remapping to the host kernel on boot. If my keyboard can only talk to the guest I won't be able to type the pass-phrase, which I must do to bring the guest up. I hope I've misunderstood; can anyone clear these issues up for me? If FLReset is irrelevant, why were those pages discussiing it? The (outdated) http://wiki.xenproject.org/wiki/VTdHowTo says Only devices with FLR capabilities are supported.. Is that just a Xen thing? The Radeon video card wasn't close to cutting edge even when I put it in; I picked it because at the time it was the highest Radeon said to have good open-source support. The blog was also interesting because at home it would be great to play games in a virtualized MS-Windows, and that clearly requires giving the VM pretty direct access to the video card. Ross From: Alex Williamson [alex.william...@redhat.com] Sent: Sunday, October 12, 2014 1:35 PM To: Boylan, Ross Cc: kvm@vger.kernel.org Subject: Re: Use VM as main machine On Sun, 2014-10-12 at 18:21 +, Boylan, Ross wrote: I would like to install a simple system on the hardware level, and do most of my work in one of the VM's. In the past I have accessed VM's graphically using vnc, but if I did that my base system (dom0 in Xen, I think) would need to run X server, which would complicate it and render the whole setup vulnerable if there was a problem with X. Also, I've had trouble in the past using evolution remotely. Is there a way to hook a VM more directly to the display, keyboard and mouse? http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM seemed potentially relevant, but tracing through the references I didn't find FLReset+ on any PCI devices other than audio, so that seems to be out (although the reference page http://wiki.xen.org/wiki/Xen_PCI_Passthrough#How_can_I_check_if_PCI_device_supports_FLR_.28Function_Level_Reset.29_.3F says that it is out of date). My immediate interest is a system that's about 5 years old, running Xeon E5420's with VT-x. It has a VGA compatible controller: Advanced Micro Devices [AMD] nee ATI RV570 [Radeon X1950 Pro] (prog-if 00 [VGA controller]) I'm also considering doing the same on a more recent system with Xeon E3-1245 v 2. It has VT-x and Vt-d, with Intel® HD Graphics P4000. I might get an external video card later. It also doesn't show FLReset+ except for audio. FLR is irrelevant, no video devices have it afaict. Neither of your video cards work with current VGA/OVMF device assignment either, IGD may work in the relative short term, Radeon X1xxx devices are too old to bother with. See this thread: https://bbs.archlinux.org/viewtopic.php?id=162768 and this blog: http://vfio.blogspot.com/ For an idea of what's currently possible. Be aware that all of this is bleeding edge and chances of needing to build your own kernel/QEMU are high. I'd recommend something like an HD5450 as the the minimum Radeon class card and an 8-series Nvidia as the minimum on that end. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 86161] New: PROBLEM: On KVM, Window 7 32bit guests sometimes run into blue screen(0x0000005c) during reboot
https://bugzilla.kernel.org/show_bug.cgi?id=86161 Bug ID: 86161 Summary: PROBLEM: On KVM, Window 7 32bit guests sometimes run into blue screen(0x005c) during reboot Product: Virtualization Version: unspecified Kernel Version: 3.10.0 and newer Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: kvm Assignee: virtualization_...@kernel-bugs.osdl.org Reporter: ng...@qq.com Regression: No Created attachment 153371 -- https://bugzilla.kernel.org/attachment.cgi?id=153371action=edit The blue screen snapshot When running Windows 7 32bit guests on qemu-kvm, sometimes the guests run into blue screen during reboot(Windows' reboot instead of qemu's). Blue screen stop code: 0x005c(0x010b,0x0003,0x,0x) Once the error happens to a guest, use qemu 'system_reset' command to restart it, the error can be reproduced. Unfortunately, there is neither mini dump nor memory dump, only the above stop code. Kernel versions: I have tested 3.10.0, 3.10.32, 3.10.57, 3.14.21. All of them are affected. 3.9.11 and 3.8.13 are not affected. I guest the bug is introduced in 3.10.0, and exists in all 3.10 and newer versions. Guest OS: Currently only Windows 7 32bit is affected. Windows xp 32bit and Windows 7 64 bit are not affected. Host CPU: Intel(R) Xeon(R) CPU E5-2620 v2 Host OS: CentOS 6.5 Recreate steps: This error can not be recreated every reboot. With below scenario, it can be recreated within 30 minutes: * Prepare a Windows 7 32bit image as base image. Put below content into a batch file(like auto_restart.bat): shutdown /r /t 60 and put the batch file into the Windows startup folder, or anywhere so long as it will be called after system startup. This will make the Windows guests automatically restart after system startup in 60 seconds. * And create 30 child images as below: qemu-img create -f qcow2 -o backing_file=win7_base,size=20G inst1.img * Run all the guests: nohup /usr/bin/qemu-system-x86_64 -name inst1 -machine pc-i440fx-1.5,accel=kvm,usb=off -m 1024 -smp 1,sockets=1,cores=1,threads=1 -drive file=inst1.img,if=none,id=drive-ide0-0-0,format=qcow2,cache=writeback -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -vnc 0.0.0.0:6100 -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/inst1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 21 /tmp/inst.out Remember to change the vnc port for each guest. * After 30 minutes, check all the guests with vnc client, you will find some of the guests have run into blue screen. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/5] KVM: x86: Emulator performs code segment checks on read access
Il 13/10/2014 01:15, Nadav Amit ha scritto: I think the problem might be even more fundamental. According to the SDM, the privilege level checks (CPL/DPL/RPL) are only performed when the segment is loaded; I see no reference to privilege checks when data is accessed. You should be able to load a segment with DPL=0 while you are in CPL=0, then change CPL to 3 and still access the segment (obviously, it is not the best practice). This can be tested without invoking the emulator... Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html