Re: [PATCH v2 2/5] KVM: x86: Emulator performs code segment checks on read access

2014-10-12 Thread Nadav Amit
Radim, Paolo, Sorry for the late responses (due to holidays)…

On Oct 11, 2014, at 12:39 PM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 10/10/2014 17:54, Radim Krčmář ha scritto:
 
 One exception is the case of conforming code segment. The SDM says: Use a
 code-segment override prefix (CS) to read a readable...  [it is] valid 
 because
 the DPL of the code segment selected by the CS register is the same as the
 CPL. This is misleading since CS.DPL may be lower (numerically) than CPL, 
 and
 CS would still be accessible.  The emulator should avoid privilage level 
 checks
 for data reads using CS.
 Ah, after stripping faulty presumptions, I'm not sure this change is
 enough ... shouldn't we also skip the check on conforming code segments?
 
 Method 2 is always valid because the privilege level of a conforming
 code segment is effectively the same as the CPL, regardless of its DPL.
 
 Radim is right; we need to skip the check on conforming code segments 
 and, once we do that, checking addr.seg is not necessary anymore.  That 
 is because, for a CS override on a nonconforming code segment, at the 
 time we fetch the instruction we know that cpl == desc.dpl.  The less 
 restrictive data segment check (cpl = desc.dpl) thus always passes.
Yes. I was wrong, assuming the code-segment checks are just a derivative of the 
data segment checks.


 
 Let's put together this check and the readability check, too, since
 we are adding another if (fetch).
 
 Can you guys think of a way to simplify the following untested patch?
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 03954f7900f5..9f3e33551db9 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -638,9 +638,6 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
   if ctxt-mode != X86EMUL_MODE_REAL)  (desc.type  8))
   || !(desc.type  2))  write)
   goto bad;
 - /* unreadable code segment */
 - if (!fetch  (desc.type  8)  !(desc.type  2))
 - goto bad;
   lim = desc_limit_scaled(desc);
   if ((ctxt-mode == X86EMUL_MODE_REAL)  !fetch 
   (ctxt-d  NoBigReal)) {
 @@ -660,17 +657,40 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
   goto bad;
   }
   cpl = ctxt-ops-cpl(ctxt);
 - if (!(desc.type  8)) {
 - /* data segment */
 + if (fetch  (desc.type  8)) {
 + if (!(desc.type  4)) {
 + /* nonconforming code segment */
 + if (cpl != desc.dpl)
 + goto bad;
 + break;
 + } else {
 + /* conforming code segment */
 + if (cpl  desc.dpl)
 + goto bad;
 + break;
 + }
 + }
 +
 + if (likely(!(desc.type  8) || (desc.type  6) == 2)) {
 + /*
 +  * Data segment or readable, nonconforming code
 +  * segment.  The SDM mentions that access through
 +  * a code-segment override prefix is always valid.
 +  * This really only matters for conforming code
 +  * segments (checked below, and always valid anyway):
 +  * for nonconforming ones, cpl == desc.dpl was checked
 +  * when fetching the instruction, meaning the following
 +  * test will always pass too.
 +  */
   if (cpl  desc.dpl)
   goto bad;
 - } else if ((desc.type  8)  !(desc.type  4)) {
 - /* nonconforming code segment */
 - if (cpl != desc.dpl)
 - goto bad;
 - } else if ((desc.type  8)  (desc.type  4)) {
 - /* conforming code segment */
 - if (cpl  desc.dpl)
 + } else {
 + /*
 +  * These are the (rare) cases that do not behave
 +  * like data segments: nonreadable code segments (bad)
 +  * and readable, conforming code segments (good).
 +  */
 + if (!(desc.type  2))
   goto bad;
   }
   break;

Looks good. I’ll give it a try but it is hard to give a definitive answer, 
since the emulator is still bug-ridden.
Please note I submitted another patch at this area (Wrong error code on limit 
violation during emulation”), so both should be merged.

Thanks,
Nadav





signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH] qcow2: fix double-free of Qcow2DiscardRegion in qcow2_process_discards

2014-10-12 Thread Kevin Wolf
Am 11.10.2014 um 09:14 hat Zhang Haoyu geschrieben:
 In qcow2_update_snapshot_refcount - qcow2_process_discards() - 
 bdrv_discard()
 may free the Qcow2DiscardRegion which is referenced by next pointer in
 qcow2_process_discards() now, in next iteration, d = next, so g_free(d)
 will double-free this Qcow2DiscardRegion.
 
 qcow2_snapshot_delete
 |- qcow2_update_snapshot_refcount
 |-- qcow2_process_discards
 |--- bdrv_discard
 | aio_poll
 |- aio_dispatch
 |-- bdrv_co_io_em_complete
 |--- qemu_coroutine_enter(co-coroutine, NULL); === coroutine entry is 
 bdrv_co_do_rw
 |--- g_free(d) == free first Qcow2DiscardRegion is okay
 |--- d = next;  == this set is done in QTAILQ_FOREACH_SAFE() macro.
 |--- g_free(d);  == double-free will happen if during previous iteration, 
 bdrv_discard had free this object.

Do you have a reproducer for this or did code review lead you to this?

At the moment I can't see how bdrv_discard(bs-file) could ever free a
Qcow2DiscardRegion of bs, as it's working on a completely different
BlockDriverState (which usually won't even be a qcow2 one).

 bdrv_co_do_rw
 |- bdrv_co_do_writev
 |-- bdrv_co_do_pwritev
 |--- bdrv_aligned_pwritev
 | qcow2_co_writev
 |- qcow2_alloc_cluster_link_l2
 |-- qcow2_free_any_clusters
 |--- qcow2_free_clusters
 | update_refcount
 |- qcow2_process_discards
 |-- g_free(d)  == In next iteration, this Qcow2DiscardRegion will be 
 double-free.

This shouldn't happen in a nested call either, as s-lock can't be taken
recursively.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [Bug?] qemu abort when trying to passthrough BCM5719 Gigabit Ethernet

2014-10-12 Thread Michael S. Tsirkin
On Sat, Oct 11, 2014 at 01:41:48AM -0600, Alex Williamson wrote:
 On Sat, 2014-10-11 at 13:58 +0800, zhanghailiang wrote:
  Hi all,
  
  When i try to passthrough BCM5719 Gigabit Ethernet to guest using the qemu 
  master branch, it aborted,
  and show kvm_set_phys_mem:error registering slot:Bad Address.
  
  qemu command:
  #./qemu/qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 
  -vnc :99 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3  -drive 
  file=/home/suse11_sp3_64,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native
   -device scsi-hd,bus=scsi0.0,channel=0,scsi-
  id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -device 
  pci-assign,host=01:00.1,id=mydevice -net none
  
  info about guest and host:
  host OS: 3.16.5
  *guest OS: Novell SuSE Linux Enterprise Server 11 SP3*
  #cat /proc/cpuinfo
  processor   : 31
  vendor_id   : GenuineIntel
  cpu family  : 6
  model   : 62
  model name  : Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz
  stepping: 4
  microcode   : 0x416
  cpu MHz : 1926.875
  cache size  : 20480 KB
  physical id : 1
  siblings: 16
  core id : 7
  cpu cores   : 8
  apicid  : 47
  initial apicid  : 47
  fpu : yes
  fpu_exception   : yes
  cpuid level : 13
  wp  : yes
  flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
  cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
  pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
  xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor 
  ds_cpl vmx
  smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
  tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt 
  pln
  pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
  bogomips: 4005.35
  clflush size: 64
  cache_alignment : 64
  address sizes   : 46 bits physical, 48 bits virtual
  power management:
  
  gdb info:
  (gdb) bt
  #0  0x71ce9989 in raise () from /usr/lib64/libc.so.6
  #1  0x71ceb098 in abort () from /usr/lib64/libc.so.6
  #2  0x556275cf in kvm_set_phys_mem (section=0x7fffedcea790, 
  add=true) at /home/qemu/qemu/kvm-all.c:711
  #3  0x5562980f in address_space_update_topology_pass 
  (as=as@entry=0x55d01ca0 address_space_memory, 
  adding=adding@entry=true,
   new_view=optimized out, new_view=optimized out, 
  old_view=0x7fffe8022a90, old_view=0x7fffe8022a90) at 
  /home/qemu/qemu/memory.c:752
  #4  0x5562b910 in address_space_update_topology (as=0x55d01ca0 
  address_space_memory) at /home/qemu/qemu/memory.c:767
  #5  memory_region_transaction_commit () at /home/qemu/qemu/memory.c:808
  #6  0x557a75b4 in pci_update_mappings (d=0x562ba9f0) at 
  hw/pci/pci.c:1113
  #7  0x557a7932 in pci_default_write_config 
  (d=d@entry=0x562ba9f0, addr=addr@entry=20, 
  val_in=val_in@entry=4294967295, l=l@entry=4)
   at hw/pci/pci.c:1165
  #8  0x5566c17e in assigned_dev_pci_write_config 
  (pci_dev=0x562ba9f0, address=20, val=4294967295, len=4)
   at /home/qemu/qemu/hw/i386/kvm/pci-assign.c:1196
  #9  0x55628fea in access_with_adjusted_size (addr=addr@entry=0, 
  value=value@entry=0x7fffedceaae0, size=size@entry=4,
   access_size_min=optimized out, access_size_max=optimized out, 
  access=0x55629160 memory_region_write_accessor, mr=0x56231f00)
   at /home/qemu/qemu/memory.c:480
  #10 0x5562dbf7 in memory_region_dispatch_write (size=4, 
  data=18446744073709551615, addr=0, mr=0x56231f00) at 
  /home/qemu/qemu/memory.c:1122
  #11 io_mem_write (mr=mr@entry=0x56231f00, addr=0, val=optimized out, 
  size=4) at /home/qemu/qemu/memory.c:1958
  #12 0x555f8963 in address_space_rw (as=0x55d01d80 
  address_space_io, addr=addr@entry=3324, buf=0x77fec000 
  \377\377\377\377,
   len=len@entry=4, is_write=is_write@entry=true) at 
  /home/qemu/qemu/exec.c:2145
  #13 0x55628491 in kvm_handle_io (count=1, size=4, 
  direction=optimized out, data=optimized out, port=3324) at 
  /home/qemu/qemu/kvm-all.c:1614
  #14 kvm_cpu_exec (cpu=cpu@entry=0x5620e610) at 
  /home/qemu/qemu/kvm-all.c:1771
  #15 0x55617182 in qemu_kvm_cpu_thread_fn (arg=0x5620e610) at 
  /home/qemu/qemu/cpus.c:953
  #16 0x76ba2df3 in start_thread () from /usr/lib64/libpthread.so.0
  #17 0x71daa3dd in clone () from /usr/lib64/libc.so.6
  
  messages log:
  Oct 10 07:43:18 localhost kernel: kvm: zapping shadow pages for mmio 
  generation wraparound
  Oct 10 07:43:27 localhost kernel: kvm [13251]: vcpu0 disabled perfctr 
  wrmsr: 0xc1 data 0xabcd
  Oct 10 07:43:28 localhost kernel: intel_iommu_map: iommu width (48) is not 
  sufficient for the mapped address (fe001000)
  Oct 10 07:43:28 localhost kernel: kvm_iommu_map_address:iommu failed to map 
  pfn=94880
  
  

Re: [Qemu-devel] [PATCH] qcow2: fix double-free of Qcow2DiscardRegion in qcow2_process_discards

2014-10-12 Thread Zhang Haoyu


On 2014-10-12 15:34, Kevin Wolf wrote:

Am 11.10.2014 um 09:14 hat Zhang Haoyu geschrieben:

In qcow2_update_snapshot_refcount - qcow2_process_discards() - bdrv_discard()
may free the Qcow2DiscardRegion which is referenced by next pointer in
qcow2_process_discards() now, in next iteration, d = next, so g_free(d)
will double-free this Qcow2DiscardRegion.

qcow2_snapshot_delete
|- qcow2_update_snapshot_refcount
|-- qcow2_process_discards
|--- bdrv_discard
| aio_poll
|- aio_dispatch
|-- bdrv_co_io_em_complete
|--- qemu_coroutine_enter(co-coroutine, NULL); === coroutine entry is 
bdrv_co_do_rw
|--- g_free(d) == free first Qcow2DiscardRegion is okay
|--- d = next;  == this set is done in QTAILQ_FOREACH_SAFE() macro.
|--- g_free(d);  == double-free will happen if during previous iteration, 
bdrv_discard had free this object.

Do you have a reproducer for this or did code review lead you to this?
This problem can be reproduced with loop of savevm - delvm - savem - 
delvm ..., about 4 hours.

When I delete the vm snapshot, qemu crashed with a core file,
I debug the core file and find the double-free and the stack.

So I add a breakpoint at g_free(d);, and find that indeed a double-free 
happened,  twice free with the same address.

And only the first discard region have not happened with double-free.


At the moment I can't see how bdrv_discard(bs-file) could ever free a
Qcow2DiscardRegion of bs, as it's working on a completely different
BlockDriverState (which usually won't even be a qcow2 one).
I think the aio_context in bdrv_discard - aio_poll(aio_context, true) 
is the qemu_aio_context,
no matter the bs or bs-file passed to bdrv_discard, so 
aio_poll(aio_context) will poll all of the aio.



bdrv_co_do_rw
|- bdrv_co_do_writev
|-- bdrv_co_do_pwritev
|--- bdrv_aligned_pwritev
| qcow2_co_writev
|- qcow2_alloc_cluster_link_l2
|-- qcow2_free_any_clusters
|--- qcow2_free_clusters
| update_refcount
|- qcow2_process_discards
|-- g_free(d)  == In next iteration, this Qcow2DiscardRegion will be 
double-free.

This shouldn't happen in a nested call either, as s-lock can't be taken
recursively.
Could you detail how s-lock prevent that, above stack is from the gdb, 
when I add a breakpoint in g_free(d).


Thanks,
Zhang Haoyu


Kevin




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: drop unsupported capabilities, fix documentation

2014-10-12 Thread Michael S. Tsirkin
No kernel ever reported KVM_CAP_DEVICE_MSIX, KVM_CAP_DEVICE_MSI,
KVM_CAP_DEVICE_ASSIGNMENT, KVM_CAP_DEVICE_DEASSIGNMENT.

This makes the documentation wrong, and no application ever
written to use these capabilities has a chance to work correctly.
The only way to detect support is to try, and test errno for ENOTTY.
That's unfortunate, but we can't fix the past.

Document the actual semantics, and drop the definitions from
the exported header to make it easier for application
developers to note and fix the bug.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 include/uapi/linux/kvm.h  |  8 
 Documentation/virtual/kvm/api.txt | 40 ---
 2 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index cf3a2ff..98d72f7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -647,11 +647,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_MP_STATE 14
 #define KVM_CAP_COALESCED_MMIO 15
 #define KVM_CAP_SYNC_MMU 16  /* Changes to host mmap are reflected in guest */
-#define KVM_CAP_DEVICE_ASSIGNMENT 17
 #define KVM_CAP_IOMMU 18
-#ifdef __KVM_HAVE_MSI
-#define KVM_CAP_DEVICE_MSI 20
-#endif
 /* Bug in KVM_SET_USER_MEMORY_REGION fixed: */
 #define KVM_CAP_DESTROY_MEMORY_REGION_WORKS 21
 #ifdef __KVM_HAVE_USER_NMI
@@ -665,10 +661,6 @@ struct kvm_ppc_smmu_info {
 #endif
 #define KVM_CAP_IRQ_ROUTING 25
 #define KVM_CAP_IRQ_INJECT_STATUS 26
-#define KVM_CAP_DEVICE_DEASSIGNMENT 27
-#ifdef __KVM_HAVE_MSIX
-#define KVM_CAP_DEVICE_MSIX 28
-#endif
 #define KVM_CAP_ASSIGN_DEV_IRQ 29
 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
 #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index beae3fd..0fe195c 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -68,9 +68,12 @@ description:
 
   Capability: which KVM extension provides this ioctl.  Can be 'basic',
   which means that is will be provided by any kernel that supports
-  API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which
+  API version 12 (see section 4.1), a KVM_CAP_xyz constant, which
   means availability needs to be checked with KVM_CHECK_EXTENSION
-  (see section 4.4).
+  (see section 4.4), or 'none' which means that while not all kernels
+  support this ioctl, there's no capability bit to check its
+  availability: for kernels that don't support the ioctl,
+  the ioctl returns -ENOTTY.
 
   Architectures: which instruction set architectures provide this ioctl.
   x86 includes both i386 and x86_64.
@@ -1257,7 +1260,7 @@ The flags bitmap is defined as:
 
 4.48 KVM_ASSIGN_PCI_DEVICE
 
-Capability: KVM_CAP_DEVICE_ASSIGNMENT
+Capability: none
 Architectures: x86 ia64
 Type: vm ioctl
 Parameters: struct kvm_assigned_pci_dev (in)
@@ -1298,10 +1301,16 @@ Only PCI header type 0 devices with PCI BAR resources 
are supported by
 device assignment.  The user requesting this ioctl must have read/write
 access to the PCI sysfs resource files associated with the device.
 
+Errors:
+  ENOTTY: kernel does not support this ioctl
+
+  Other error conditions may be defined by individual device types or
+  have their standard meanings.
+
 
 4.49 KVM_DEASSIGN_PCI_DEVICE
 
-Capability: KVM_CAP_DEVICE_DEASSIGNMENT
+Capability: none
 Architectures: x86 ia64
 Type: vm ioctl
 Parameters: struct kvm_assigned_pci_dev (in)
@@ -1309,9 +1318,14 @@ Returns: 0 on success, -1 on error
 
 Ends PCI device assignment, releasing all associated resources.
 
-See KVM_CAP_DEVICE_ASSIGNMENT for the data structure. Only assigned_dev_id is
+See KVM_ASSIGN_PCI_DEVICE for the data structure. Only assigned_dev_id is
 used in kvm_assigned_pci_dev to identify the device.
 
+Errors:
+  ENOTTY: kernel does not support this ioctl
+
+  Other error conditions may be defined by individual device types or
+  have their standard meanings.
 
 4.50 KVM_ASSIGN_DEV_IRQ
 
@@ -1346,6 +1360,12 @@ The following flags are defined:
 It is not valid to specify multiple types per host or guest IRQ. However, the
 IRQ type of host and guest can differ or can even be null.
 
+Errors:
+  ENOTTY: kernel does not support this ioctl
+
+  Other error conditions may be defined by individual device types or
+  have their standard meanings.
+
 
 4.51 KVM_DEASSIGN_DEV_IRQ
 
@@ -1423,7 +1443,7 @@ struct kvm_irq_routing_s390_adapter {
 
 4.53 KVM_ASSIGN_SET_MSIX_NR
 
-Capability: KVM_CAP_DEVICE_MSIX
+Capability: none
 Architectures: x86 ia64
 Type: vm ioctl
 Parameters: struct kvm_assigned_msix_nr (in)
@@ -1445,7 +1465,7 @@ struct kvm_assigned_msix_nr {
 
 4.54 KVM_ASSIGN_SET_MSIX_ENTRY
 
-Capability: KVM_CAP_DEVICE_MSIX
+Capability: none
 Architectures: x86 ia64
 Type: vm ioctl
 Parameters: struct kvm_assigned_msix_entry (in)
@@ -1461,6 +1481,12 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+Errors:
+  ENOTTY: kernel 

Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors

2014-10-12 Thread Michael S. Tsirkin
On Sat, Oct 11, 2014 at 03:16:44PM +0800, Jason Wang wrote:
 Below should be useful for some experiments Jason is doing.
 I thought I'd send it out for early review/feedback.
 
 event idx feature allows us to defer interrupts until
 a specific # of descriptors were used.
 Sometimes it might be useful to get an interrupt after
 a specific descriptor, regardless.
 This adds a descriptor flag for this, and an API
 to create an urgent output descriptor.
 This is still an RFC:
 we'll need a feature bit for drivers to detect this,
 but we've run out of feature bits for virtio 0.X.
 For experimentation purposes, drivers can assume
 this is set, or add a driver-specific feature bit.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com

I see that as compared to my original patch, you have
added a new flag: VRING_AVAIL_F_NO_URGENT_INTERRUPT
I don't think it's necessary, see below.

As such, I think this patch should be split:
- original patch adding support for urgent descriptors
- a patch adding virtqueue_enable/disable_cb_urgent(_prepare)?




 ---
  drivers/virtio/virtio_ring.c | 75 
 +---
  include/linux/virtio.h   | 14 
  include/uapi/linux/virtio_ring.h |  5 ++-
  3 files changed, 89 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 4d08f45a..a5188c6 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -115,6 +115,7 @@ static inline struct scatterlist *sg_next_arr(struct 
 scatterlist *sg,
  
  /* Set up an indirect table of descriptors and add it to the queue. */
  static inline int vring_add_indirect(struct vring_virtqueue *vq,
 +  bool urgent,
struct scatterlist *sgs[],
struct scatterlist *(*next)
  (struct scatterlist *, unsigned int *),
 @@ -173,6 +174,8 @@ static inline int vring_add_indirect(struct 
 vring_virtqueue *vq,
   /* Use a single buffer which doesn't continue */
   head = vq-free_head;
   vq-vring.desc[head].flags = VRING_DESC_F_INDIRECT;
 + if (urgent)
 + vq-vring.desc[head].flags |= VRING_DESC_F_URGENT;
   vq-vring.desc[head].addr = virt_to_phys(desc);
   /* kmemleak gives a false positive, as it's hidden by virt_to_phys */
   kmemleak_ignore(desc);
 @@ -185,6 +188,7 @@ static inline int vring_add_indirect(struct 
 vring_virtqueue *vq,
  }
  
  static inline int virtqueue_add(struct virtqueue *_vq,
 + bool urgent,
   struct scatterlist *sgs[],
   struct scatterlist *(*next)
 (struct scatterlist *, unsigned int *),
 @@ -227,7 +231,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
   /* If the host supports indirect descriptor tables, and we have multiple
* buffers, then go indirect. FIXME: tune this threshold */
   if (vq-indirect  total_sg  1  vq-vq.num_free) {
 - head = vring_add_indirect(vq, sgs, next, total_sg, total_out,
 + head = vring_add_indirect(vq, urgent, sgs, next, total_sg, 
 total_out,
 total_in,
 out_sgs, in_sgs, gfp);
   if (likely(head = 0))
 @@ -256,6 +260,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
   for (n = 0; n  out_sgs; n++) {
   for (sg = sgs[n]; sg; sg = next(sg, total_out)) {
   vq-vring.desc[i].flags = VRING_DESC_F_NEXT;
 + if (urgent) {
 + vq-vring.desc[head].flags |= 
 VRING_DESC_F_URGENT;
 + urgent = false;
 + }
   vq-vring.desc[i].addr = sg_phys(sg);
   vq-vring.desc[i].len = sg-length;
   prev = i;
 @@ -265,6 +273,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
   for (; n  (out_sgs + in_sgs); n++) {
   for (sg = sgs[n]; sg; sg = next(sg, total_in)) {
   vq-vring.desc[i].flags = 
 VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 + if (urgent) {
 + vq-vring.desc[head].flags |= 
 VRING_DESC_F_URGENT;
 + urgent = false;
 + }
   vq-vring.desc[i].addr = sg_phys(sg);
   vq-vring.desc[i].len = sg-length;
   prev = i;
 @@ -305,6 +317,8 @@ add_head:
  
  /**
   * virtqueue_add_sgs - expose buffers to other end
 + * @urgent: in case virtqueue_enable_cb_delayed was called, cause an 
 interrupt
 + *  after this descriptor was completed
   * @vq: the struct virtqueue we're talking about.
   * @sgs: array of terminated scatterlists.
   * 

Re: [PATCH 3.10] vhost-net: backport extend device allocation to 3.10

2014-10-12 Thread Michael S. Tsirkin
On Thu, Oct 09, 2014 at 08:41:23AM +0400, Dmitry Petuhov wrote:
 From: Michael S. Tsirkin m...@redhat.com
 
 upstream commit 23cc5a991c7a9fb7e6d6550e65cee4f4173111c5
 
 Michael Mueller provided a patch to reduce the size of
 vhost-net structure as some allocations could fail under
 memory pressure/fragmentation. We are still left with
 high order allocations though.
 
 This patch is handling the problem at the core level, allowing
 vhost structures to use vmalloc() if kmalloc() failed.
 
 As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
 to kzalloc() flags to do this fallback only when really needed.
 
 People are still looking at cleaner ways to handle the problem
 at the API level, probably passing in multiple iovecs.
 This hack seems consistent with approaches
 taken since then by drivers/vhost/scsi.c and net/core/dev.c
 
 Based on patch by Romain Francoise.
 
 Cc: Michael Mueller m...@linux.vnet.ibm.com
 Signed-off-by: Romain Francoise rom...@orebokech.com
 Acked-by: Michael S. Tsirkin m...@redhat.com
 [mityapetuhov: backport to v3.10: vhost_net_free() in one more place]
 Signed-off-by: Dmitry Petuhov mityapetu...@gmail.com

Sounds reasonable.

Acked-by: Michael S. Tsirkin m...@redhat.com

 ---
 diff -uprN a/drivers/vhost/net.c b/drivers/vhost/net.c
 --- a/drivers/vhost/net.c 2014-10-09 06:45:08.336283258 +0400
 +++ b/drivers/vhost/net.c 2014-10-09 06:51:21.796266607 +0400
 @@ -18,6 +18,7 @@
  #include linux/rcupdate.h
  #include linux/file.h
  #include linux/slab.h
 +#include linux/vmalloc.h
  
  #include linux/net.h
  #include linux/if_packet.h
 @@ -707,18 +708,30 @@ static void handle_rx_net(struct vhost_w
   handle_rx(net);
  }
  
 +static void vhost_net_free(void *addr)
 +{
 + if (is_vmalloc_addr(addr))
 + vfree(addr);
 + else
 + kfree(addr);
 +}
 +
  static int vhost_net_open(struct inode *inode, struct file *f)
  {
 - struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL);
 + struct vhost_net *n;
   struct vhost_dev *dev;
   struct vhost_virtqueue **vqs;
   int r, i;
  
 - if (!n)
 - return -ENOMEM;
 + n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
 + if (!n) {
 + n = vmalloc(sizeof *n);
 + if (!n)
 + return -ENOMEM;
 + }
   vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
   if (!vqs) {
 - kfree(n);
 + vhost_net_free(n);
   return -ENOMEM;
   }
  
 @@ -737,7 +750,7 @@ static int vhost_net_open(struct inode *
   }
   r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
   if (r  0) {
 - kfree(n);
 + vhost_net_free(n);
   kfree(vqs);
   return r;
   }
 @@ -840,7 +853,7 @@ static int vhost_net_release(struct inod
* since jobs can re-queue themselves. */
   vhost_net_flush(n);
   kfree(n-dev.vqs);
 - kfree(n);
 + vhost_net_free(n);
   return 0;
  }
  
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/25] virtio_net: minor cleanup

2014-10-12 Thread Michael S. Tsirkin
goto done;
done:
return;
is ugly, it was put there to make diff review easier.
replace by open-coded return.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/net/virtio_net.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 23e4a69..ef04d23 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1403,7 +1403,7 @@ static void virtnet_config_changed_work(struct 
work_struct *work)
 
if (virtio_cread_feature(vi-vdev, VIRTIO_NET_F_STATUS,
 struct virtio_net_config, status, v)  0)
-   goto done;
+   return;
 
if (v  VIRTIO_NET_S_ANNOUNCE) {
netdev_notify_peers(vi-dev);
@@ -1414,7 +1414,7 @@ static void virtnet_config_changed_work(struct 
work_struct *work)
v = VIRTIO_NET_S_LINK_UP;
 
if (vi-status == v)
-   goto done;
+   return;
 
vi-status = v;
 
@@ -1425,8 +1425,6 @@ static void virtnet_config_changed_work(struct 
work_struct *work)
netif_carrier_off(vi-dev);
netif_tx_stop_all_queues(vi-dev);
}
-done:
-   return;
 }
 
 static void virtnet_config_changed(struct virtio_device *vdev)
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 24/25] virtio_scsi: drop scan callback

2014-10-12 Thread Michael S. Tsirkin
Enable VQs early like we do for restore.
This makes it possible to drop the scan callback,
moving scanning into the probe function, and making
code simpler.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 23 +++
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 327eba0..5f022ff 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -860,17 +860,6 @@ static void virtscsi_init_vq(struct virtio_scsi_vq 
*virtscsi_vq,
virtscsi_vq-vq = vq;
 }
 
-static void virtscsi_scan(struct virtio_device *vdev)
-{
-   struct Scsi_Host *shost = virtio_scsi_host(vdev);
-   struct virtio_scsi *vscsi = shost_priv(shost);
-
-   if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
-   virtscsi_kick_event_all(vscsi);
-
-   scsi_scan_host(shost);
-}
-
 static void virtscsi_remove_vqs(struct virtio_device *vdev)
 {
struct Scsi_Host *sh = virtio_scsi_host(vdev);
@@ -1007,10 +996,13 @@ static int virtscsi_probe(struct virtio_device *vdev)
err = scsi_add_host(shost, vdev-dev);
if (err)
goto scsi_add_host_failed;
-   /*
-* scsi_scan_host() happens in virtscsi_scan() via virtio_driver-scan()
-* after VIRTIO_CONFIG_S_DRIVER_OK has been set..
-*/
+
+   virtio_enable_vqs_early(vdev);
+
+   if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
+   virtscsi_kick_event_all(vscsi);
+
+   scsi_scan_host(shost);
return 0;
 
 scsi_add_host_failed:
@@ -1090,7 +1082,6 @@ static struct virtio_driver virtio_scsi_driver = {
.driver.owner = THIS_MODULE,
.id_table = id_table,
.probe = virtscsi_probe,
-   .scan = virtscsi_scan,
 #ifdef CONFIG_PM_SLEEP
.freeze = virtscsi_freeze,
.restore = virtscsi_restore,
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 25/25] virtio-rng: refactor probe error handling

2014-10-12 Thread Michael S. Tsirkin
Code like
vi-vq = NULL;
kfree(vi)
does not make sense.

Clean it up, use goto error labels for cleanup.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/char/hw_random/virtio-rng.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 132c9cc..72295ea 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -109,8 +109,8 @@ static int probe_common(struct virtio_device *vdev)
 
vi-index = index = ida_simple_get(rng_index_ida, 0, 0, GFP_KERNEL);
if (index  0) {
-   kfree(vi);
-   return index;
+   err = index;
+   goto err_ida;
}
sprintf(vi-name, virtio_rng.%d, index);
init_completion(vi-have_data);
@@ -128,13 +128,16 @@ static int probe_common(struct virtio_device *vdev)
vi-vq = virtio_find_single_vq(vdev, random_recv_done, input);
if (IS_ERR(vi-vq)) {
err = PTR_ERR(vi-vq);
-   vi-vq = NULL;
-   kfree(vi);
-   ida_simple_remove(rng_index_ida, index);
-   return err;
+   goto err_find;
}
 
return 0;
+
+err_find:
+   ida_simple_remove(rng_index_ida, index);
+err_ida:
+   kfree(vi);
+   return err;
 }
 
 static void remove_common(struct virtio_device *vdev)
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 23/25] virtio_balloon: enable VQs early on restore

2014-10-12 Thread Michael S. Tsirkin
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after resume returns, virtio balloon
violated this rule by adding bufs, which causes the VQ to be used
directly within restore.

To fix, call virtio_enable_vqs_early before using VQ.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 25ebe8e..9629fad 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -538,6 +538,8 @@ static int virtballoon_restore(struct virtio_device *vdev)
if (ret)
return ret;
 
+   virtio_enable_vqs_early(vdev);
+
fill_balloon(vb, towards_target(vb));
update_balloon_size(vb);
return 0;
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 22/25] virtio_scsi: fix race on device removal

2014-10-12 Thread Michael S. Tsirkin
We cancel event work on device removal, but an interrupt
could trigger immediately after this, and queue it
again.

To fix, set a flag.

Loosely based on patch by Paolo Bonzini

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 501838d..327eba0 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -110,6 +110,9 @@ struct virtio_scsi {
/* CPU hotplug notifier */
struct notifier_block nb;
 
+   /* Protected by event_vq lock */
+   bool stop_events;
+
struct virtio_scsi_vq ctrl_vq;
struct virtio_scsi_vq event_vq;
struct virtio_scsi_vq req_vqs[];
@@ -303,6 +306,11 @@ static void virtscsi_cancel_event_work(struct virtio_scsi 
*vscsi)
 {
int i;
 
+   /* Stop scheduling work before calling cancel_work_sync.  */
+   spin_lock_irq(vscsi-event_vq.vq_lock);
+   vscsi-stop_events = true;
+   spin_unlock_irq(vscsi-event_vq.vq_lock);
+
for (i = 0; i  VIRTIO_SCSI_EVENT_LEN; i++)
cancel_work_sync(vscsi-event_list[i].work);
 }
@@ -390,7 +398,8 @@ static void virtscsi_complete_event(struct virtio_scsi 
*vscsi, void *buf)
 {
struct virtio_scsi_event_node *event_node = buf;
 
-   queue_work(system_freezable_wq, event_node-work);
+   if (!vscsi-stop_events)
+   queue_work(system_freezable_wq, event_node-work);
 }
 
 static void virtscsi_event_done(struct virtqueue *vq)
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 19/25] virtio_console: enable VQs early on restore

2014-10-12 Thread Michael S. Tsirkin
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after resume returns, virtio console violated this
rule by adding inbufs, which causes the VQ to be used directly within
restore.

To fix, call virtio_enable_vqs_early before using VQs.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/char/virtio_console.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 6ebe8f6..2ae843f 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -2184,6 +2184,8 @@ static int virtcons_restore(struct virtio_device *vdev)
if (ret)
return ret;
 
+   virtio_enable_vqs_early(portdev-vdev);
+
if (use_multiport(portdev))
fill_queue(portdev-c_ivq, portdev-c_ivq_lock);
 
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 20/25] virtio_net: enable VQs early on restore

2014-10-12 Thread Michael S. Tsirkin
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio net violated this
rule by using receive VQs within restore.

To fix, call virtio_enable_vqs_early before using VQs.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3551417..6b6e136 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1912,6 +1912,8 @@ static int virtnet_restore(struct virtio_device *vdev)
if (err)
return err;
 
+   virtio_enable_vqs_early(vdev);
+
if (netif_running(vi-dev)) {
for (i = 0; i  vi-curr_queue_pairs; i++)
if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 21/25] virito_scsi: use freezable WQ for events

2014-10-12 Thread Michael S. Tsirkin
From: Paolo Bonzini pbonz...@redhat.com

Michael S. Tsirkin noticed a race condition:
we reset device on freeze, but system WQ is still
running so it might try adding bufs to a VQ meanwhile.

To fix, switch to handling events from the freezable WQ.

Reported-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 2b565b3..501838d 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -390,7 +390,7 @@ static void virtscsi_complete_event(struct virtio_scsi 
*vscsi, void *buf)
 {
struct virtio_scsi_event_node *event_node = buf;
 
-   schedule_work(event_node-work);
+   queue_work(system_freezable_wq, event_node-work);
 }
 
 static void virtscsi_event_done(struct virtqueue *vq)
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 15/25] virtio_net: fix use after free on allocation failure

2014-10-12 Thread Michael S. Tsirkin
In the extremely unlikely event that driver initialization fails after
RX buffers are added, virtio net frees RX buffers while VQs are
still active, potentially causing device to use a freed buffer.

To fix, reset device first - same as we do on device removal.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 430f3ae..3551417 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1830,6 +1830,8 @@ static int virtnet_probe(struct virtio_device *vdev)
return 0;
 
 free_recv_bufs:
+   vi-vdev-config-reset(vdev);
+
free_receive_bufs(vi);
unregister_netdev(dev);
 free_vqs:
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 14/25] 9p/trans_virtio: enable VQs early

2014-10-12 Thread Michael S. Tsirkin
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, but virtio 9p device
adds self to channel list within probe, at which point VQ can be
used in violation of the spec.

To fix, call virtio_enable_vqs_early before using VQs.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 net/9p/trans_virtio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 6940d8f..766ba48 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -575,6 +575,8 @@ static int p9_virtio_probe(struct virtio_device *vdev)
/* Ceiling limit to avoid denial of service attacks */
chan-p9_max_pages = nr_free_buffer_pages()/4;
 
+   virtio_enable_vqs_early(vdev);
+
mutex_lock(virtio_9p_lock);
list_add_tail(chan-chan_list, virtio_chan_list);
mutex_unlock(virtio_9p_lock);
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 17/25] virtio_blk: enable VQs early on restore

2014-10-12 Thread Michael S. Tsirkin
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio block violated
this rule on restore by restarting queues, which might in theory
cause the VQ to be used directly within restore.

To fix, call virtio_enable_vqs_early before using starting queues.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/block/virtio_blk.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 46b04bf..1c95af5 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -804,10 +804,13 @@ static int virtblk_restore(struct virtio_device *vdev)
int ret;
 
ret = init_vq(vdev-priv);
-   if (!ret)
-   blk_mq_start_stopped_hw_queues(vblk-disk-queue, true);
+   if (ret)
+   return ret;
+
+   virtio_enable_vqs_early(vdev);
 
-   return ret;
+   blk_mq_start_stopped_hw_queues(vblk-disk-queue, true);
+   return 0;
 }
 #endif
 
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 13/25] virtio_console: enable VQs early

2014-10-12 Thread Michael S. Tsirkin
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio console violated this
rule by adding inbufs, which causes the VQ to be used directly within
probe.

To fix, call virtio_enable_vqs_early before using VQs.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/char/virtio_console.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index b585b47..6ebe8f6 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1449,6 +1449,8 @@ static int add_port(struct ports_device *portdev, u32 id)
spin_lock_init(port-outvq_lock);
init_waitqueue_head(port-waitqueue);
 
+   virtio_enable_vqs_early(portdev-vdev);
+
/* Fill the in_vq with buffers so the host can send us data. */
nr_added_bufs = fill_queue(port-in_vq, port-inbuf_lock);
if (!nr_added_bufs) {
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 18/25] virtio_scsi: enable VQs early on restore

2014-10-12 Thread Michael S. Tsirkin
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio scsi violated
this rule on restore by kicking event vq within restore.

To fix, call virtio_enable_vqs_early before using event queue.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/scsi/virtio_scsi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 0642ce3..2b565b3 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -1054,6 +1054,8 @@ static int virtscsi_restore(struct virtio_device *vdev)
return err;
}
 
+   virtio_enable_vqs_early(vdev);
+
if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
virtscsi_kick_event_all(vscsi);
 
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 12/25] virtio_blk: enable VQs early

2014-10-12 Thread Michael S. Tsirkin
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio block violated this
rule by calling add_disk, which causes the VQ to be used directly within
probe.

To fix, call virtio_enable_vqs_early before using VQs.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/block/virtio_blk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 89ba8d6..46b04bf 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -719,6 +719,8 @@ static int virtblk_probe(struct virtio_device *vdev)
if (!err  opt_io_size)
blk_queue_io_opt(q, blk_size * opt_io_size);
 
+   virtio_enable_vqs_early(vdev);
+
add_disk(vblk-disk);
err = device_create_file(disk_to_dev(vblk-disk), dev_attr_serial);
if (err)
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 11/25] virtio_net: enable VQs early

2014-10-12 Thread Michael S. Tsirkin
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio net violated this
rule by using receive VQs within probe.

To fix, call virtio_enable_vqs_early before using VQs.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ef04d23..430f3ae 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1792,6 +1792,8 @@ static int virtnet_probe(struct virtio_device *vdev)
goto free_vqs;
}
 
+   virtio_enable_vqs_early(vdev);
+
/* Last of all, set up some receive buffers. */
for (i = 0; i  vi-curr_queue_pairs; i++) {
try_fill_recv(vi-rq[i], GFP_KERNEL);
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 10/25] virtio: add API to enable VQs early

2014-10-12 Thread Michael S. Tsirkin
virtio spec 0.9.X requires DRIVER_OK to be set before
VQs are used, but some drivers use VQs before probe
function returns.
Since DRIVER_OK is set after probe, this violates the spec.

Even though under virtio 1.0 transitional devices support this
behaviour, we want to make it possible for those early callers to become
spec compliant and eventually support non-transitional devices.

Add API for drivers to call before using VQs.

Sets DRIVER_OK internally.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 include/linux/virtio_config.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index e8f8f71..e36403b 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -109,6 +109,23 @@ struct virtqueue *virtio_find_single_vq(struct 
virtio_device *vdev,
return vq;
 }
 
+/**
+ * virtio_enable_vqs_early - enable vq use in probe function
+ * @vdev: the device
+ *
+ * Driver must call this to use vqs in the probe function.
+ *
+ * Note: vqs are enabled automatically after probe returns.
+ */
+static inline
+void virtio_enable_vqs_early(struct virtio_device *dev)
+{
+   unsigned status = dev-config-get_status(dev);
+
+   BUG_ON(status  VIRTIO_CONFIG_S_DRIVER_OK);
+   dev-config-set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
+}
+
 static inline
 const char *virtio_bus_name(struct virtio_device *vdev)
 {
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/25] virtio-blk: drop config_mutex

2014-10-12 Thread Michael S. Tsirkin
config_mutex served two purposes: prevent multiple concurrent config
change handlers, and synchronize access to config_enable flag.

Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1
workqueue: make all workqueues non-reentrant
all workqueues are non-reentrant, and config_enable
is now gone.

Get rid of the unnecessary lock.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/block/virtio_blk.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 91272f1a..89ba8d6 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -41,9 +41,6 @@ struct virtio_blk
/* Process context for config space updates */
struct work_struct config_work;
 
-   /* Lock for config space updates */
-   struct mutex config_lock;
-
/* What host tells us, plus 2 for header  tailer. */
unsigned int sg_elems;
 
@@ -344,8 +341,6 @@ static void virtblk_config_changed_work(struct work_struct 
*work)
char *envp[] = { RESIZE=1, NULL };
u64 capacity, size;
 
-   mutex_lock(vblk-config_lock);
-
/* Host must always specify the capacity. */
virtio_cread(vdev, struct virtio_blk_config, capacity, capacity);
 
@@ -369,8 +364,6 @@ static void virtblk_config_changed_work(struct work_struct 
*work)
set_capacity(vblk-disk, capacity);
revalidate_disk(vblk-disk);
kobject_uevent_env(disk_to_dev(vblk-disk)-kobj, KOBJ_CHANGE, envp);
-
-   mutex_unlock(vblk-config_lock);
 }
 
 static void virtblk_config_changed(struct virtio_device *vdev)
@@ -601,7 +594,6 @@ static int virtblk_probe(struct virtio_device *vdev)
 
vblk-vdev = vdev;
vblk-sg_elems = sg_elems;
-   mutex_init(vblk-config_lock);
 
INIT_WORK(vblk-config_work, virtblk_config_changed_work);
 
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/25] virtio-pci: move freeze/restore to virtio core

2014-10-12 Thread Michael S. Tsirkin
This is in preparation to extending config changed event handling
in core.
Wrapping these in an API also seems to make for a cleaner code.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 include/linux/virtio.h  |  6 +
 drivers/virtio/virtio.c | 54 +
 drivers/virtio/virtio_pci.c | 54 ++---
 3 files changed, 62 insertions(+), 52 deletions(-)

diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 3c19bd3..8df7ba8 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
 /**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
+ * @failed: saved value for CONFIG_S_FAILED bit (for restore)
  * @dev: underlying device.
  * @id: the device type identification (used to match it with a driver).
  * @config: the configuration ops for this device.
@@ -88,6 +89,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
  */
 struct virtio_device {
int index;
+   bool failed;
struct device dev;
struct virtio_device_id id;
const struct virtio_config_ops *config;
@@ -109,6 +111,10 @@ void unregister_virtio_device(struct virtio_device *dev);
 void virtio_break_device(struct virtio_device *dev);
 
 void virtio_config_changed(struct virtio_device *dev);
+#ifdef CONFIG_PM_SLEEP
+int virtio_device_freeze(struct virtio_device *dev);
+int virtio_device_restore(struct virtio_device *dev);
+#endif
 
 /**
  * virtio_driver - operations for a virtio I/O driver
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3980687..8216b73 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -248,6 +248,60 @@ void virtio_config_changed(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(virtio_config_changed);
 
+#ifdef CONFIG_PM_SLEEP
+int virtio_device_freeze(struct virtio_device *dev)
+{
+   struct virtio_driver *drv = drv_to_virtio(dev-dev.driver);
+
+   dev-failed = dev-config-get_status(dev)  VIRTIO_CONFIG_S_FAILED;
+
+   if (drv  drv-freeze)
+   return drv-freeze(dev);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_device_freeze);
+
+int virtio_device_restore(struct virtio_device *dev)
+{
+   struct virtio_driver *drv = drv_to_virtio(dev-dev.driver);
+
+   /* We always start by resetting the device, in case a previous
+* driver messed it up. */
+   dev-config-reset(dev);
+
+   /* Acknowledge that we've seen the device. */
+   add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+
+   /* Maybe driver failed before freeze.
+* Restore the failed status, for debugging. */
+   if (dev-failed)
+   add_status(dev, VIRTIO_CONFIG_S_FAILED);
+
+   if (!drv)
+   return 0;
+
+   /* We have a driver! */
+   add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+
+   dev-config-finalize_features(dev);
+
+   if (drv-restore) {
+   int ret = drv-restore(dev);
+   if (ret) {
+   add_status(dev, VIRTIO_CONFIG_S_FAILED);
+   return ret;
+   }
+   }
+
+   /* Finally, tell the device we're all set */
+   add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_device_restore);
+#endif
+
 static int virtio_init(void)
 {
if (bus_register(virtio_bus) != 0)
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index f39f4e7..d34ebfa 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -57,9 +57,6 @@ struct virtio_pci_device
/* Vectors allocated, excluding per-vq vectors if any */
unsigned msix_used_vectors;
 
-   /* Status saved during hibernate/restore */
-   u8 saved_status;
-
/* Whether we have vector per vq */
bool per_vq_vectors;
 };
@@ -764,16 +761,9 @@ static int virtio_pci_freeze(struct device *dev)
 {
struct pci_dev *pci_dev = to_pci_dev(dev);
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
-   struct virtio_driver *drv;
int ret;
 
-   drv = container_of(vp_dev-vdev.dev.driver,
-  struct virtio_driver, driver);
-
-   ret = 0;
-   vp_dev-saved_status = vp_get_status(vp_dev-vdev);
-   if (drv  drv-freeze)
-   ret = drv-freeze(vp_dev-vdev);
+   ret = virtio_device_freeze(vp_dev-vdev);
 
if (!ret)
pci_disable_device(pci_dev);
@@ -784,54 +774,14 @@ static int virtio_pci_restore(struct device *dev)
 {
struct pci_dev *pci_dev = to_pci_dev(dev);
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
-   struct virtio_driver *drv;
-   unsigned status = 0;
int ret;
 
-   drv = container_of(vp_dev-vdev.dev.driver,
-  struct 

[PATCH v3 04/25] virtio: defer config changed notifications

2014-10-12 Thread Michael S. Tsirkin
Defer config changed notifications that arrive during
probe/scan/freeze/restore.

This will allow drivers to set DRIVER_OK earlier, without worrying about
racing with config change interrupts.

This change will also benefit old hypervisors (before 2009)
that send interrupts without checking DRIVER_OK: previously,
the callback could race with driver-specific initialization.

This will also help simplify drivers.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 include/linux/virtio.h  |  6 ++
 drivers/virtio/virtio.c | 57 +
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 8df7ba8..5636b11 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -79,6 +79,9 @@ bool virtqueue_is_broken(struct virtqueue *vq);
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
  * @failed: saved value for CONFIG_S_FAILED bit (for restore)
+ * @config_enabled: configuration change reporting enabled
+ * @config_changed: configuration change reported while disabled
+ * @config_lock: protects configuration change reporting
  * @dev: underlying device.
  * @id: the device type identification (used to match it with a driver).
  * @config: the configuration ops for this device.
@@ -90,6 +93,9 @@ bool virtqueue_is_broken(struct virtqueue *vq);
 struct virtio_device {
int index;
bool failed;
+   bool config_enabled;
+   bool config_changed;
+   spinlock_t config_lock;
struct device dev;
struct virtio_device_id id;
const struct virtio_config_ops *config;
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 8216b73..2536701 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -117,6 +117,42 @@ void virtio_check_driver_offered_feature(const struct 
virtio_device *vdev,
 }
 EXPORT_SYMBOL_GPL(virtio_check_driver_offered_feature);
 
+static void __virtio_config_changed(struct virtio_device *dev)
+{
+   struct virtio_driver *drv = drv_to_virtio(dev-dev.driver);
+
+   if (!dev-config_enabled)
+   dev-config_changed = true;
+   else if (drv  drv-config_changed)
+   drv-config_changed(dev);
+}
+
+void virtio_config_changed(struct virtio_device *dev)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(dev-config_lock, flags);
+   __virtio_config_changed(dev);
+   spin_unlock_irqrestore(dev-config_lock, flags);
+}
+EXPORT_SYMBOL_GPL(virtio_config_changed);
+
+static void virtio_config_disable(struct virtio_device *dev)
+{
+   spin_lock_irq(dev-config_lock);
+   dev-config_enabled = false;
+   spin_unlock_irq(dev-config_lock);
+}
+
+static void virtio_config_enable(struct virtio_device *dev)
+{
+   spin_lock_irq(dev-config_lock);
+   dev-config_enabled = true;
+   __virtio_config_changed(dev);
+   dev-config_changed = false;
+   spin_unlock_irq(dev-config_lock);
+}
+
 static int virtio_dev_probe(struct device *_d)
 {
int err, i;
@@ -153,6 +189,8 @@ static int virtio_dev_probe(struct device *_d)
add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
if (drv-scan)
drv-scan(dev);
+
+   virtio_config_enable(dev);
}
 
return err;
@@ -163,6 +201,8 @@ static int virtio_dev_remove(struct device *_d)
struct virtio_device *dev = dev_to_virtio(_d);
struct virtio_driver *drv = drv_to_virtio(dev-dev.driver);
 
+   virtio_config_disable(dev);
+
drv-remove(dev);
 
/* Driver should have reset device. */
@@ -211,6 +251,10 @@ int register_virtio_device(struct virtio_device *dev)
dev-index = err;
dev_set_name(dev-dev, virtio%u, dev-index);
 
+   spin_lock_init(dev-config_lock);
+   dev-config_enabled = false;
+   dev-config_changed = false;
+
/* We always start by resetting the device, in case a previous
 * driver messed it up.  This also tests that code path a little. */
dev-config-reset(dev);
@@ -239,20 +283,13 @@ void unregister_virtio_device(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(unregister_virtio_device);
 
-void virtio_config_changed(struct virtio_device *dev)
-{
-   struct virtio_driver *drv = drv_to_virtio(dev-dev.driver);
-
-   if (drv  drv-config_changed)
-   drv-config_changed(dev);
-}
-EXPORT_SYMBOL_GPL(virtio_config_changed);
-
 #ifdef CONFIG_PM_SLEEP
 int virtio_device_freeze(struct virtio_device *dev)
 {
struct virtio_driver *drv = drv_to_virtio(dev-dev.driver);
 
+   virtio_config_disable(dev);
+
dev-failed = dev-config-get_status(dev)  VIRTIO_CONFIG_S_FAILED;
 
if (drv  drv-freeze)
@@ -297,6 +334,8 @@ int virtio_device_restore(struct virtio_device *dev)
/* Finally, tell the device we're all set */

[PATCH v3 08/25] virtio-net: drop config_mutex

2014-10-12 Thread Michael S. Tsirkin
config_mutex served two purposes: prevent multiple concurrent config
change handlers, and synchronize access to config_enable flag.

Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1
workqueue: make all workqueues non-reentrant
all workqueues are non-reentrant, and config_enable
is now gone.

Get rid of the unnecessary lock.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/net/virtio_net.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 743fb04..23e4a69 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -132,9 +132,6 @@ struct virtnet_info {
/* Work struct for config space updates */
struct work_struct config_work;
 
-   /* Lock for config space updates */
-   struct mutex config_lock;
-
/* Does the affinity hint is set for virtqueues? */
bool affinity_hint_set;
 
@@ -1404,7 +1401,6 @@ static void virtnet_config_changed_work(struct 
work_struct *work)
container_of(work, struct virtnet_info, config_work);
u16 v;
 
-   mutex_lock(vi-config_lock);
if (virtio_cread_feature(vi-vdev, VIRTIO_NET_F_STATUS,
 struct virtio_net_config, status, v)  0)
goto done;
@@ -1430,7 +1426,7 @@ static void virtnet_config_changed_work(struct 
work_struct *work)
netif_tx_stop_all_queues(vi-dev);
}
 done:
-   mutex_unlock(vi-config_lock);
+   return;
 }
 
 static void virtnet_config_changed(struct virtio_device *vdev)
@@ -1751,7 +1747,6 @@ static int virtnet_probe(struct virtio_device *vdev)
u64_stats_init(virtnet_stats-rx_syncp);
}
 
-   mutex_init(vi-config_lock);
INIT_WORK(vi-config_work, virtnet_config_changed_work);
 
/* If we can receive ANY GSO packets, we must allocate large ones. */
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 05/25] virtio_blk: drop config_enable

2014-10-12 Thread Michael S. Tsirkin
Now that virtio core ensures config changes don't
arrive during probing, drop config_enable flag
in virtio blk.
On removal, flush is now sufficient to guarantee that
no change work is queued.

This help simplify the driver, and will allow
setting DRIVER_OK earlier without losing config
change notifications.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/block/virtio_blk.c | 23 ---
 1 file changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0a58140..91272f1a 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -44,9 +44,6 @@ struct virtio_blk
/* Lock for config space updates */
struct mutex config_lock;
 
-   /* enable config space updates */
-   bool config_enable;
-
/* What host tells us, plus 2 for header  tailer. */
unsigned int sg_elems;
 
@@ -348,8 +345,6 @@ static void virtblk_config_changed_work(struct work_struct 
*work)
u64 capacity, size;
 
mutex_lock(vblk-config_lock);
-   if (!vblk-config_enable)
-   goto done;
 
/* Host must always specify the capacity. */
virtio_cread(vdev, struct virtio_blk_config, capacity, capacity);
@@ -374,7 +369,7 @@ static void virtblk_config_changed_work(struct work_struct 
*work)
set_capacity(vblk-disk, capacity);
revalidate_disk(vblk-disk);
kobject_uevent_env(disk_to_dev(vblk-disk)-kobj, KOBJ_CHANGE, envp);
-done:
+
mutex_unlock(vblk-config_lock);
 }
 
@@ -609,7 +604,6 @@ static int virtblk_probe(struct virtio_device *vdev)
mutex_init(vblk-config_lock);
 
INIT_WORK(vblk-config_work, virtblk_config_changed_work);
-   vblk-config_enable = true;
 
err = init_vq(vblk);
if (err)
@@ -771,10 +765,8 @@ static void virtblk_remove(struct virtio_device *vdev)
int index = vblk-index;
int refc;
 
-   /* Prevent config work handler from accessing the device. */
-   mutex_lock(vblk-config_lock);
-   vblk-config_enable = false;
-   mutex_unlock(vblk-config_lock);
+   /* Make sure no work handler is accessing the device. */
+   flush_work(vblk-config_work);
 
del_gendisk(vblk-disk);
blk_cleanup_queue(vblk-disk-queue);
@@ -784,8 +776,6 @@ static void virtblk_remove(struct virtio_device *vdev)
/* Stop all the virtqueues. */
vdev-config-reset(vdev);
 
-   flush_work(vblk-config_work);
-
refc = atomic_read(disk_to_dev(vblk-disk)-kobj.kref.refcount);
put_disk(vblk-disk);
vdev-config-del_vqs(vdev);
@@ -805,11 +795,7 @@ static int virtblk_freeze(struct virtio_device *vdev)
/* Ensure we don't receive any more interrupts */
vdev-config-reset(vdev);
 
-   /* Prevent config work handler from accessing the device. */
-   mutex_lock(vblk-config_lock);
-   vblk-config_enable = false;
-   mutex_unlock(vblk-config_lock);
-
+   /* Make sure no work handler is accessing the device. */
flush_work(vblk-config_work);
 
blk_mq_stop_hw_queues(vblk-disk-queue);
@@ -823,7 +809,6 @@ static int virtblk_restore(struct virtio_device *vdev)
struct virtio_blk *vblk = vdev-priv;
int ret;
 
-   vblk-config_enable = true;
ret = init_vq(vdev-priv);
if (!ret)
blk_mq_start_stopped_hw_queues(vblk-disk-queue, true);
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/25] virtio_pci: fix virtio spec compliance on restore

2014-10-12 Thread Michael S. Tsirkin
On restore, virtio pci does the following:
+ set features
+ init vqs etc - device can be used at this point!
+ set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits

This is in violation of the virtio spec, which
requires the following order:
- ACKNOWLEDGE
- DRIVER
- init vqs
- DRIVER_OK

This behaviour will break with hypervisors that assume spec compliant
behaviour.  It seems like a good idea to have this patch applied to
stable branches to reduce the support butden for the hypervisors.

Cc: sta...@vger.kernel.org
Cc: Amit Shah amit.s...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_pci.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 3d1463c..add40d0 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -789,6 +789,7 @@ static int virtio_pci_restore(struct device *dev)
struct pci_dev *pci_dev = to_pci_dev(dev);
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
struct virtio_driver *drv;
+   unsigned status = 0;
int ret;
 
drv = container_of(vp_dev-vdev.dev.driver,
@@ -799,14 +800,40 @@ static int virtio_pci_restore(struct device *dev)
return ret;
 
pci_set_master(pci_dev);
+   /* We always start by resetting the device, in case a previous
+* driver messed it up. */
+   vp_reset(vp_dev-vdev);
+
+   /* Acknowledge that we've seen the device. */
+   status |= VIRTIO_CONFIG_S_ACKNOWLEDGE;
+   vp_set_status(vp_dev-vdev, status);
+
+   /* Maybe driver failed before freeze.
+* Restore the failed status, for debugging. */
+   status |= vp_dev-saved_status  VIRTIO_CONFIG_S_FAILED;
+   vp_set_status(vp_dev-vdev, status);
+
+   if (!drv)
+   return 0;
+
+   /* We have a driver! */
+   status |= VIRTIO_CONFIG_S_DRIVER;
+   vp_set_status(vp_dev-vdev, status);
+
vp_finalize_features(vp_dev-vdev);
 
-   if (drv  drv-restore)
+   if (drv-restore) {
ret = drv-restore(vp_dev-vdev);
+   if (ret) {
+   status |= VIRTIO_CONFIG_S_FAILED;
+   vp_set_status(vp_dev-vdev, status);
+   return ret;
+   }
+   }
 
/* Finally, tell the device we're all set */
-   if (!ret)
-   vp_set_status(vp_dev-vdev, vp_dev-saved_status);
+   status |= VIRTIO_CONFIG_S_DRIVER_OK;
+   vp_set_status(vp_dev-vdev, status);
 
return ret;
 }
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/5] KVM: x86: Emulator performs code segment checks on read access

2014-10-12 Thread Paolo Bonzini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 12/10/2014 08:57, Nadav Amit ha scritto:
 Looks good. I’ll give it a try but it is hard to give a definitive 
 answer, since the emulator is still bug-ridden.

Yes, we need to write unit tests for this, especially the conforming
case.  A bit of a pain to get kvm-unit-tests in ring 3 (access.flat
does it), but I'll give it a shot.

Paolo
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQEcBAEBAgAGBQJUOnBCAAoJEBRUblpOawnXLHEH/3zlIsVFow9IWrZsxaZopWy5
DKncQriHnhsyc6W2U9oFvNgB/7+dTWscdR58jnKLr/Qt64DGH01pq4MvisdhV+31
53pv+CycUgs85EoZkA7MzArbT1Tb/gd/KB8QoqdKIC4+bNEd6JMydcsq5d6nMgOd
yWLYjcYzWzzmJSNCC7UYOtN4SB4brC5RyITLq+CgT4ufSPHtBYxGd8fSHpzvJzTU
T1hqelYcLRFGzoPR4ux4SP8EgXle+sslFV4KAyXkucLIafmeekcmR/AO8hy84TXj
Kcpt6Y3hsY8pWXM3YB4LF/7+NuaPu/Ud+2VkbfuFhq5gUtVLwjtWtA32IlYdFEc=
=BhUU
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Use VM as main machine

2014-10-12 Thread Boylan, Ross
I would like to install a simple system on the hardware level, and do most of 
my work in one of the VM's.
In the past I have accessed VM's graphically using vnc, but if I did that my 
base system (dom0 in Xen, I think) would need to run X server, which would 
complicate it and render the whole setup vulnerable if there was a problem with 
X.  Also, I've had trouble in the past using evolution remotely.

Is there a way to hook a VM more directly to the display, keyboard and mouse?

http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM seemed 
potentially relevant, but tracing through the references I didn't find FLReset+ 
on any PCI devices other than audio, so that seems to be out (although the 
reference page 
http://wiki.xen.org/wiki/Xen_PCI_Passthrough#How_can_I_check_if_PCI_device_supports_FLR_.28Function_Level_Reset.29_.3F
 says that it is out of date).

My immediate interest is a system that's about 5 years old, running Xeon 
E5420's  with VT-x.  It has a VGA compatible controller: Advanced Micro Devices 
[AMD] nee ATI RV570 [Radeon X1950 Pro] (prog-if 00 [VGA controller]) 
I'm also considering doing the same on a more recent system with Xeon E3-1245 v 
2.  It has VT-x and Vt-d, with Intel® HD Graphics P4000.  I might get an 
external video card later.  It also doesn't show FLReset+ except for audio.

Thanks.
Ross Boylan--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use VM as main machine

2014-10-12 Thread Alex Williamson
On Sun, 2014-10-12 at 18:21 +, Boylan, Ross wrote:
 I would like to install a simple system on the hardware level, and do most of 
 my work in one of the VM's.
 In the past I have accessed VM's graphically using vnc, but if I did that my 
 base system (dom0 in Xen, I think) would need to run X server, which would 
 complicate it and render the whole setup vulnerable if there was a problem 
 with X.  Also, I've had trouble in the past using evolution remotely.
 
 Is there a way to hook a VM more directly to the display, keyboard and mouse?
 
 http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM seemed 
 potentially relevant, but tracing through the references I didn't find 
 FLReset+ on any PCI devices other than audio, so that seems to be out 
 (although the reference page 
 http://wiki.xen.org/wiki/Xen_PCI_Passthrough#How_can_I_check_if_PCI_device_supports_FLR_.28Function_Level_Reset.29_.3F
  says that it is out of date).
 
 My immediate interest is a system that's about 5 years old, running Xeon 
 E5420's  with VT-x.  It has a VGA compatible controller: Advanced Micro 
 Devices [AMD] nee ATI RV570 [Radeon X1950 Pro] (prog-if 00 [VGA controller]) 
 I'm also considering doing the same on a more recent system with Xeon E3-1245 
 v 2.  It has VT-x and Vt-d, with Intel® HD Graphics P4000.  I might get an 
 external video card later.  It also doesn't show FLReset+ except for audio.

FLR is irrelevant, no video devices have it afaict.  Neither of your
video cards work with current VGA/OVMF device assignment either, IGD may
work in the relative short term, Radeon X1xxx devices are too old to
bother with.  See this thread:

https://bbs.archlinux.org/viewtopic.php?id=162768

and this blog:

http://vfio.blogspot.com/

For an idea of what's currently possible.  Be aware that all of this is
bleeding edge and chances of needing to build your own kernel/QEMU are
high.  I'd recommend something like an HD5450 as the the minimum Radeon
class card and an 8-series Nvidia as the minimum on that end.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Use VM as main machine

2014-10-12 Thread Boylan, Ross
Thanks for the info, and thanks also for working on the software.

It sounds as if the short answer is I can't do what I want with my current 
hardware; am I understanding correcty?  And  to get it to work I need two video 
cards?  Do they both need to support the newer capabilities, or is it enough 
that one of them does?

It looks as if the archlinux blog describes binding the input devices as well 
as the video to the VM.  If you need to access the host system, how do you do 
it?  shutdown the VM? ssh  to the host from the VM?  

That seems to raise another challenge: to bring up the guests I first need to 
decrypt a device in the host.  So it would be helpful to see the messages from 
the host and essential to type in a pass-phrase for the host.   But it looks as 
one must specify options for remapping to the host kernel on boot.  If my 
keyboard can only talk to the guest I won't be able to type the  pass-phrase, 
which I must do to bring the guest up.  I hope I've misunderstood; can anyone 
clear these issues up  for me?

If FLReset is irrelevant, why were those  pages discussiing it?  The (outdated) 
http://wiki.xenproject.org/wiki/VTdHowTo says Only devices with FLR 
capabilities are supported..  Is that just a Xen thing?

The Radeon video card wasn't close to cutting edge even when I put it in; I 
picked it because at the time it was the highest Radeon said to have good 
open-source support.

The blog was also interesting because at home it would be great to play games 
in a virtualized MS-Windows, and that clearly requires giving the VM pretty 
direct access to the video card.

Ross

From: Alex Williamson [alex.william...@redhat.com]
Sent: Sunday, October 12, 2014 1:35 PM
To: Boylan, Ross
Cc: kvm@vger.kernel.org
Subject: Re: Use VM as main machine

On Sun, 2014-10-12 at 18:21 +, Boylan, Ross wrote:
 I would like to install a simple system on the hardware level, and do most of 
 my work in one of the VM's.
 In the past I have accessed VM's graphically using vnc, but if I did that my 
 base system (dom0 in Xen, I think) would need to run X server, which would 
 complicate it and render the whole setup vulnerable if there was a problem 
 with X.  Also, I've had trouble in the past using evolution remotely.

 Is there a way to hook a VM more directly to the display, keyboard and mouse?

 http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM seemed 
 potentially relevant, but tracing through the references I didn't find 
 FLReset+ on any PCI devices other than audio, so that seems to be out 
 (although the reference page 
 http://wiki.xen.org/wiki/Xen_PCI_Passthrough#How_can_I_check_if_PCI_device_supports_FLR_.28Function_Level_Reset.29_.3F
  says that it is out of date).

 My immediate interest is a system that's about 5 years old, running Xeon 
 E5420's  with VT-x.  It has a VGA compatible controller: Advanced Micro 
 Devices [AMD] nee ATI RV570 [Radeon X1950 Pro] (prog-if 00 [VGA controller])
 I'm also considering doing the same on a more recent system with Xeon E3-1245 
 v 2.  It has VT-x and Vt-d, with Intel® HD Graphics P4000.  I might get an 
 external video card later.  It also doesn't show FLReset+ except for audio.

FLR is irrelevant, no video devices have it afaict.  Neither of your
video cards work with current VGA/OVMF device assignment either, IGD may
work in the relative short term, Radeon X1xxx devices are too old to
bother with.  See this thread:

https://bbs.archlinux.org/viewtopic.php?id=162768

and this blog:

http://vfio.blogspot.com/

For an idea of what's currently possible.  Be aware that all of this is
bleeding edge and chances of needing to build your own kernel/QEMU are
high.  I'd recommend something like an HD5450 as the the minimum Radeon
class card and an 8-series Nvidia as the minimum on that end.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 86161] New: PROBLEM: On KVM, Window 7 32bit guests sometimes run into blue screen(0x0000005c) during reboot

2014-10-12 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=86161

Bug ID: 86161
   Summary: PROBLEM: On KVM, Window 7 32bit guests sometimes run
into blue screen(0x005c) during reboot
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.10.0 and newer
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: kvm
  Assignee: virtualization_...@kernel-bugs.osdl.org
  Reporter: ng...@qq.com
Regression: No

Created attachment 153371
  -- https://bugzilla.kernel.org/attachment.cgi?id=153371action=edit
The blue screen snapshot

When running Windows 7 32bit guests on qemu-kvm, sometimes the guests run into
blue screen during reboot(Windows' reboot instead of qemu's).

Blue screen stop code:
0x005c(0x010b,0x0003,0x,0x)

Once the error happens to a guest, use qemu 'system_reset' command to restart
it, the error can be reproduced.
Unfortunately, there is neither mini dump nor memory dump, only the above stop
code.

Kernel versions:
I have tested 3.10.0, 3.10.32, 3.10.57, 3.14.21. All of them are affected.
3.9.11 and 3.8.13 are not affected.
I guest the bug is introduced in 3.10.0, and exists in all 3.10 and newer
versions.
Guest OS:
Currently only Windows 7 32bit is affected.
Windows xp 32bit and Windows 7 64 bit are not affected.
Host CPU:
Intel(R) Xeon(R) CPU E5-2620 v2
Host OS:
CentOS 6.5

Recreate steps:
   This error can not be recreated every reboot.
   With below scenario, it can be recreated within 30 minutes:
* Prepare a Windows 7 32bit image as base image. Put below content into a batch
file(like auto_restart.bat):
shutdown /r /t 60
   and put the batch file into the Windows startup folder, or anywhere so long
as it will be called after system startup.
   This will make the Windows guests automatically restart after system startup
in 60 seconds.
* And create 30 child images as below:
   qemu-img create -f qcow2 -o backing_file=win7_base,size=20G inst1.img
* Run all the guests:
   nohup /usr/bin/qemu-system-x86_64 -name inst1 -machine
pc-i440fx-1.5,accel=kvm,usb=off -m 1024 -smp 1,sockets=1,cores=1,threads=1
-drive file=inst1.img,if=none,id=drive-ide0-0-0,format=qcow2,cache=writeback
-device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
-vnc 0.0.0.0:6100 -vga qxl -global qxl-vga.ram_size=67108864 -global
qxl-vga.vram_size=67108864 -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/inst1.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=readline -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 21
 /tmp/inst.out 
   Remember to change the vnc port for each guest.
* After 30 minutes, check all the guests with vnc client, you will find some of
the guests have run into blue screen.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/5] KVM: x86: Emulator performs code segment checks on read access

2014-10-12 Thread Paolo Bonzini
Il 13/10/2014 01:15, Nadav Amit ha scritto:
 I think the problem might be even more fundamental. According to the
 SDM, the privilege level checks (CPL/DPL/RPL) are only performed when
 the segment is loaded; I see no reference to privilege checks when
 data is accessed. You should be able to load a segment with DPL=0
 while you are in CPL=0, then change CPL to 3 and still access the
 segment (obviously, it is not the best practice).

This can be tested without invoking the emulator...

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html