[RFC v4 4/5] VSOCK: Introduce vhost_vsock.ko

2015-12-22 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

VM sockets vhost transport implementation.  This driver runs on the
host.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v4:
 * Add MAINTAINERS file entry
 * virtqueue used len is now sizeof(pkt->hdr) + pkt->len instead of just
   pkt->len
 * checkpatch.pl cleanups
 * Clarify struct vhost_vsock locking
 * Add comments about optimization that disables virtqueue notify
 * Drop unused vhost_vsock_handle_ctl_kick()
 * Call wake_up() after decrementing total_tx_buf to prevent deadlock
v3:
 * Remove unneeded variable used to store return value
   (Fengguang Wu <fengguang...@intel.com> and Julia Lawall
   <julia.law...@lip6.fr>)
v2:
 * Add missing total_tx_buf decrement
 * Support flexible rx/tx descriptor layout
 * Refuse to assign reserved CIDs
 * Refuse guest CID if already in use
 * Only accept correctly addressed packets
vhost: checkpatch.pl cleanups

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 MAINTAINERS   |   2 +
 drivers/vhost/vsock.c | 607 ++
 drivers/vhost/vsock.h |   4 +
 3 files changed, 613 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 67d8504..0181dc2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11370,6 +11370,8 @@ F:  include/linux/virtio_vsock.h
 F: include/uapi/linux/virtio_vsock.h
 F: net/vmw_vsock/virtio_transport_common.c
 F: net/vmw_vsock/virtio_transport.c
+F: drivers/vhost/vsock.c
+F: drivers/vhost/vsock.h
 
 VIRTUAL SERIO DEVICE DRIVER
 M: Stephen Chandler Paul <thatsly...@gmail.com>
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
new file mode 100644
index 000..2c5963c
--- /dev/null
+++ b/drivers/vhost/vsock.c
@@ -0,0 +1,607 @@
+/*
+ * vhost transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He <as...@redhat.com>
+ * Stefan Hajnoczi <stefa...@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "vhost.h"
+#include "vsock.h"
+
+#define VHOST_VSOCK_DEFAULT_HOST_CID   2
+
+enum {
+   VHOST_VSOCK_FEATURES = VHOST_FEATURES,
+};
+
+/* Used to track all the vhost_vsock instances on the system. */
+static LIST_HEAD(vhost_vsock_list);
+static DEFINE_MUTEX(vhost_vsock_mutex);
+
+struct vhost_vsock {
+   struct vhost_dev dev;
+   struct vhost_virtqueue vqs[VSOCK_VQ_MAX];
+
+   /* Link to global vhost_vsock_list, protected by vhost_vsock_mutex */
+   struct list_head list;
+
+   struct vhost_work send_pkt_work;
+   wait_queue_head_t send_wait;
+
+   /* Fields protected by vqs[VSOCK_VQ_RX].mutex */
+   struct list_head send_pkt_list; /* host->guest pending packets */
+   u32 total_tx_buf;
+
+   u32 guest_cid;
+};
+
+static u32 vhost_transport_get_local_cid(void)
+{
+   return VHOST_VSOCK_DEFAULT_HOST_CID;
+}
+
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+   struct vhost_vsock *vsock;
+
+   mutex_lock(_vsock_mutex);
+   list_for_each_entry(vsock, _vsock_list, list) {
+   if (vsock->guest_cid == guest_cid) {
+   mutex_unlock(_vsock_mutex);
+   return vsock;
+   }
+   }
+   mutex_unlock(_vsock_mutex);
+
+   return NULL;
+}
+
+static void
+vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
+   struct vhost_virtqueue *vq)
+{
+   bool added = false;
+
+   mutex_lock(>mutex);
+
+   /* Avoid further vmexits, we're already processing the virtqueue */
+   vhost_disable_notify(>dev, vq);
+
+   for (;;) {
+   struct virtio_vsock_pkt *pkt;
+   struct iov_iter iov_iter;
+   unsigned out, in;
+   size_t nbytes;
+   size_t len;
+   int head;
+
+   if (list_empty(>send_pkt_list)) {
+   vhost_enable_notify(>dev, vq);
+   break;
+   }
+
+   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+, , NULL, NULL);
+   if (head < 0)
+   break;
+
+   if (head == vq->num) {
+   /* We cannot finish yet if more buffers snuck in while
+* re-enabling notify.
+*/
+   if (unlikely(vhost_enable_notify(>dev, vq))) {
+   vhost_disable_notify(>dev, vq);
+   continue;
+   }
+   break;
+   }
+
+   pkt = list_first_entry(>

[RFC v4 0/5] Add virtio transport for AF_VSOCK

2015-12-22 Thread Stefan Hajnoczi
This series is based on v4.4-rc2 and the "virtio: make find_vqs()
checkpatch.pl-friendly" patch I recently submitted.

v4:
 * Addressed code review comments from Alex Bennee
 * MAINTAINERS file entries for new files
 * Trace events instead of pr_debug()
 * RST packet is sent when there is no listen socket
 * Allow guest->host connections again (began discussing netfilter support with
   Matt Benjamin instead of hard-coding security policy in virtio-vsock code)
 * Many checkpatch.pl cleanups (will be 100% clean in v5)

v3:
 * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
   of REQUEST/RESPONSE/ACK
 * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
   (also drop v2 Patch 1, it's only needed for SOCK_DGRAM)
 * Only allow host->guest connections (same security model as latest
   VMware)
 * Don't put vhost vsock driver into staging
 * Add missing Kconfig dependencies (Arnd Bergmann <a...@arndb.de>)
 * Remove unneeded variable used to store return value
   (Fengguang Wu <fengguang...@intel.com> and Julia Lawall
   <julia.law...@lip6.fr>)

v2:
 * Rebased onto Linux v4.4-rc2
 * vhost: Refuse to assign reserved CIDs
 * vhost: Refuse guest CID if already in use
 * vhost: Only accept correctly addressed packets (no spoofing!)
 * vhost: Support flexible rx/tx descriptor layout
 * vhost: Add missing total_tx_buf decrement
 * virtio_transport: Fix total_tx_buf accounting
 * virtio_transport: Add virtio_transport global mutex to prevent races
 * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
 * common: Fix peer_buf_alloc inheritance on child socket

This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors.  It is currently only implemented for VMware's VMCI transport.

This series implements the proposed virtio-vsock device specification from
here:
http://permalink.gmane.org/gmane.comp.emulators.virtio.devel/980

Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
picked up the series again.

The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock

Why virtio-vsock?
-
Guest<->host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.

virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM semantics.  Applications on the host can easily connect to guest
agents because the sockets API allows multiple connections to a listen socket
(unlike virtio-serial).  This simplifies the guest<->host communication and
eliminates the need for extra processes on the host to arbitrate virtio-serial
ports.

Overview

This series adds 3 pieces:

1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko

2. virtio_transport.ko - guest driver

3. drivers/vhost/vsock.ko - host driver

Howto
-
The following kernel options are needed:
  CONFIG_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS_COMMON=y
  CONFIG_VHOST_VSOCK=m

Launch QEMU as follows:
  # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3

Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
is 2 and the guest must be assigned a CID (3 in the example above).

Status
--
This patch series implements the latest draft specification.  Please review.

Asias He (4):
  VSOCK: Introduce virtio_vsock_common.ko
  VSOCK: Introduce virtio_transport.ko
  VSOCK: Introduce vhost_vsock.ko
  VSOCK: Add Makefile and Kconfig

Stefan Hajnoczi (1):
  VSOCK: transport-specific vsock_transport functions

 MAINTAINERS|  13 +
 drivers/vhost/Kconfig  |  15 +
 drivers/vhost/Makefile |   4 +
 drivers/vhost/vsock.c  | 607 +++
 drivers/vhost/vsock.h  |   4 +
 include/linux/virtio_vsock.h   | 167 +
 include/net/af_vsock.h |   3 +
 .../trace/events/vsock_virtio_transport_common.h   | 144 
 include/uapi/linux/virtio_ids.h|   1 +
 include/uapi/linux/virtio_vsock.h  |  87 +++
 net/vmw_vsock/Kconfig  |  19 +
 net/vmw_vsock/Makefile |   2 +
 net/vmw_vsock/af_vsock.c   |   9 +
 net/vmw_vsock/virtio_transport.c   | 481 
 net/vmw_vsock/virtio_transport_common.c| 834 +
 15 files changed, 2390 insertions(+)
 create 

[RFC v4 3/5] VSOCK: Introduce virtio_transport.ko

2015-12-22 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

VM sockets virtio transport implementation.  This driver runs in the
guest.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v4:
 * Add MAINTAINERS file entry
 * Drop short/long rx packets
 * checkpatch.pl cleanups
 * Clarify locking in struct virtio_vsock
 * Narrow local variable scopes as suggested by Alex Bennee
 * Call wake_up() after decrementing total_tx_buf to avoid deadlock
v2:
 * Fix total_tx_buf accounting
 * Add virtio_transport global mutex to prevent races

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 MAINTAINERS  |   1 +
 net/vmw_vsock/virtio_transport.c | 481 +++
 2 files changed, 482 insertions(+)
 create mode 100644 net/vmw_vsock/virtio_transport.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d42db78..67d8504 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11369,6 +11369,7 @@ S:  Maintained
 F: include/linux/virtio_vsock.h
 F: include/uapi/linux/virtio_vsock.h
 F: net/vmw_vsock/virtio_transport_common.c
+F: net/vmw_vsock/virtio_transport.c
 
 VIRTUAL SERIO DEVICE DRIVER
 M: Stephen Chandler Paul <thatsly...@gmail.com>
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
new file mode 100644
index 000..e4787bf
--- /dev/null
+++ b/net/vmw_vsock/virtio_transport.c
@@ -0,0 +1,481 @@
+/*
+ * virtio transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He <as...@redhat.com>
+ * Stefan Hajnoczi <stefa...@redhat.com>
+ *
+ * Some of the code is take from Gerd Hoffmann <kra...@redhat.com>'s
+ * early virtio-vsock proof-of-concept bits.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct workqueue_struct *virtio_vsock_workqueue;
+static struct virtio_vsock *the_virtio_vsock;
+static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock);
+
+struct virtio_vsock {
+   struct virtio_device *vdev;
+   struct virtqueue *vqs[VSOCK_VQ_MAX];
+
+   /* Virtqueue processing is deferred to a workqueue */
+   struct work_struct tx_work;
+   struct work_struct rx_work;
+
+   wait_queue_head_t tx_wait;  /* for waiting for tx resources */
+
+   /* The following fields are protected by tx_lock.  vqs[VSOCK_VQ_TX]
+* must be accessed with tx_lock held.
+*/
+   struct mutex tx_lock;
+   u32 total_tx_buf;
+
+   /* The following fields are protected by rx_lock.  vqs[VSOCK_VQ_RX]
+* must be accessed with rx_lock held.
+*/
+   struct mutex rx_lock;
+   int rx_buf_nr;
+   int rx_buf_max_nr;
+
+   u32 guest_cid;
+};
+
+static struct virtio_vsock *virtio_vsock_get(void)
+{
+   return the_virtio_vsock;
+}
+
+static u32 virtio_transport_get_local_cid(void)
+{
+   struct virtio_vsock *vsock = virtio_vsock_get();
+
+   return vsock->guest_cid;
+}
+
+static int
+virtio_transport_send_one_pkt(struct virtio_vsock *vsock,
+ struct virtio_vsock_pkt *pkt)
+{
+   struct scatterlist hdr, buf, *sgs[2];
+   int ret, in_sg = 0, out_sg = 0;
+   struct virtqueue *vq;
+   DEFINE_WAIT(wait);
+
+   vq = vsock->vqs[VSOCK_VQ_TX];
+
+   /* Put pkt in the virtqueue */
+   sg_init_one(, >hdr, sizeof(pkt->hdr));
+   sgs[out_sg++] = 
+   if (pkt->buf) {
+   sg_init_one(, pkt->buf, pkt->len);
+   sgs[out_sg++] = 
+   }
+
+   mutex_lock(>tx_lock);
+   while ((ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt,
+   GFP_KERNEL)) < 0) {
+   prepare_to_wait_exclusive(>tx_wait, ,
+ TASK_UNINTERRUPTIBLE);
+   mutex_unlock(>tx_lock);
+   schedule();
+   mutex_lock(>tx_lock);
+   finish_wait(>tx_wait, );
+   }
+   virtqueue_kick(vq);
+   mutex_unlock(>tx_lock);
+
+   return pkt->len;
+}
+
+static int
+virtio_transport_send_pkt_no_sock(struct virtio_vsock_pkt *pkt)
+{
+   struct virtio_vsock *vsock;
+
+   vsock = virtio_vsock_get();
+   if (!vsock) {
+   virtio_transport_free_pkt(pkt);
+   return -ENODEV;
+   }
+
+   return virtio_transport_send_one_pkt(vsock, pkt);
+}
+
+static int
+virtio_transport_send_pkt(struct vsock_sock *vsk,
+ struct virtio_vsock_pkt_info *info)
+{
+   u32 src_cid, src_port, dst_cid, dst_port;
+   struct virtio_vsock_sock *vvs;
+   struct virtio_vsock_pkt *pkt;
+   struct virtio_vsock *vsock;
+   u32 pkt_len = info->pkt_len;
+   DEF

[RFC v4 1/5] VSOCK: transport-specific vsock_transport functions

2015-12-22 Thread Stefan Hajnoczi
struct vsock_transport contains function pointers called by AF_VSOCK
core code.  The transport may want its own transport-specific function
pointers and they can be added after struct vsock_transport.

Allow the transport to fetch vsock_transport.  It can downcast it to
access transport-specific function pointers.

The virtio transport will use this.

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 include/net/af_vsock.h   | 3 +++
 net/vmw_vsock/af_vsock.c | 9 +
 2 files changed, 12 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index e9eb2d6..23f5525 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -165,6 +165,9 @@ static inline int vsock_core_init(const struct 
vsock_transport *t)
 }
 void vsock_core_exit(void);
 
+/* The transport may downcast this to access transport-specific functions */
+const struct vsock_transport *vsock_core_get_transport(void);
+
 / UTILS /
 
 void vsock_release_pending(struct sock *pending);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 7fd1220..9783a38 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1994,6 +1994,15 @@ void vsock_core_exit(void)
 }
 EXPORT_SYMBOL_GPL(vsock_core_exit);
 
+const struct vsock_transport *vsock_core_get_transport(void)
+{
+   /* vsock_register_mutex not taken since only the transport uses this
+* function and only while registered.
+*/
+   return transport;
+}
+EXPORT_SYMBOL_GPL(vsock_core_get_transport);
+
 MODULE_AUTHOR("VMware, Inc.");
 MODULE_DESCRIPTION("VMware Virtual Socket Family");
 MODULE_VERSION("1.0.1.0-k");
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 5/5] VSOCK: Add Makefile and Kconfig

2015-12-22 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v4:
 * Make checkpatch.pl happy with longer option description
 * Clarify dependency on virtio rather than QEMU as suggested by Alex
   Bennee
v3:
 * Don't put vhost vsock driver into staging
 * Add missing Kconfig dependencies (Arnd Bergmann <a...@arndb.de>)
---
 drivers/vhost/Kconfig  | 15 +++
 drivers/vhost/Makefile |  4 
 net/vmw_vsock/Kconfig  | 19 +++
 net/vmw_vsock/Makefile |  2 ++
 4 files changed, 40 insertions(+)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..d7aae9e 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -21,6 +21,21 @@ config VHOST_SCSI
Say M here to enable the vhost_scsi TCM fabric module
for use with virtio-scsi guests
 
+config VHOST_VSOCK
+   tristate "vhost virtio-vsock driver"
+   depends on VSOCKETS && EVENTFD
+   select VIRTIO_VSOCKETS_COMMON
+   select VHOST
+   select VHOST_RING
+   default n
+   ---help---
+   This kernel module can be loaded in the host kernel to provide AF_VSOCK
+   sockets for communicating with guests.  The guests must have the
+   virtio_transport.ko driver loaded to use the virtio-vsock device.
+
+   To compile this driver as a module, choose M here: the module will be 
called
+   vhost_vsock.
+
 config VHOST_RING
tristate
---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index e0441c3..6b012b9 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,5 +4,9 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
+obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
+vhost_vsock-y := vsock.o
+
 obj-$(CONFIG_VHOST_RING) += vringh.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 14810ab..f27e74b 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -26,3 +26,22 @@ config VMWARE_VMCI_VSOCKETS
 
  To compile this driver as a module, choose M here: the module
  will be called vmw_vsock_vmci_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS
+   tristate "virtio transport for Virtual Sockets"
+   depends on VSOCKETS && VIRTIO
+   select VIRTIO_VSOCKETS_COMMON
+   help
+ This module implements a virtio transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine host supports Virtual
+ Sockets over virtio.
+
+ To compile this driver as a module, choose M here: the module
+ will be called virtio_vsock_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS_COMMON
+   tristate
+   ---help---
+ This option is selected by any driver which needs to access
+ the virtio_vsock.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 2ce52d7..cf4c294 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -1,5 +1,7 @@
 obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS) += virtio_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += virtio_transport_common.o
 
 vsock-y += af_vsock.o vsock_addr.o
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 2/5] VSOCK: Introduce virtio_vsock_common.ko

2015-12-22 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

This module contains the common code and header files for the following
virtio_transporto and vhost_vsock kernel modules.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v4:
 * Add MAINTAINERS file entry
 * checkpatch.pl cleanups
 * linux_vsock.h: drop wrong copy-pasted license header
 * Move tx sock refcounting to virtio_transport_alloc/free_pkt() to fix
   leaks in error paths
 * Add send_pkt_no_sock() to send RST packets with no listen socket
 * Rename per-socket state from virtio_transport to virtio_vsock_sock
 * Move send_pkt_ops to new virtio_transport struct
 * Drop dumppkt function, packet capture will be added in the future
 * Drop empty virtio_transport_dec_tx_pkt()
 * Allow guest->host connections again
 * Use trace events instead of pr_debug()
v3:
 * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
   of REQUEST/RESPONSE/ACK
 * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
 * Only allow host->guest connections (same security model as latest
   VMware)
v2:
 * Fix peer_buf_alloc inheritance on child socket
 * Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
---
 MAINTAINERS|  10 +
 include/linux/virtio_vsock.h   | 167 +
 .../trace/events/vsock_virtio_transport_common.h   | 144 
 include/uapi/linux/virtio_ids.h|   1 +
 include/uapi/linux/virtio_vsock.h  |  87 +++
 net/vmw_vsock/virtio_transport_common.c| 834 +
 6 files changed, 1243 insertions(+)
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/trace/events/vsock_virtio_transport_common.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 050d0e7..d42db78 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11360,6 +11360,16 @@ S: Maintained
 F: drivers/media/v4l2-core/videobuf2-*
 F: include/media/videobuf2-*
 
+VIRTIO AND VHOST VSOCK DRIVER
+M: Stefan Hajnoczi <stefa...@redhat.com>
+L: kvm@vger.kernel.org
+L: virtualizat...@lists.linux-foundation.org
+L: net...@vger.kernel.org
+S: Maintained
+F: include/linux/virtio_vsock.h
+F: include/uapi/linux/virtio_vsock.h
+F: net/vmw_vsock/virtio_transport_common.c
+
 VIRTUAL SERIO DEVICE DRIVER
 M: Stephen Chandler Paul <thatsly...@gmail.com>
 S: Maintained
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
new file mode 100644
index 000..4acf1ad
--- /dev/null
+++ b/include/linux/virtio_vsock.h
@@ -0,0 +1,167 @@
+#ifndef _LINUX_VIRTIO_VSOCK_H
+#define _LINUX_VIRTIO_VSOCK_H
+
+#include 
+#include 
+#include 
+#include 
+
+#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE  128
+#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE   (1024 * 4)
+#define VIRTIO_VSOCK_MAX_BUF_SIZE  0xUL
+#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE  (1024 * 64)
+#define VIRTIO_VSOCK_MAX_TX_BUF_SIZE   (1024 * 1024 * 16)
+#define VIRTIO_VSOCK_MAX_DGRAM_SIZE(1024 * 64)
+
+enum {
+   VSOCK_VQ_CTRL   = 0,
+   VSOCK_VQ_RX = 1, /* for host to guest data */
+   VSOCK_VQ_TX = 2, /* for guest to host data */
+   VSOCK_VQ_MAX= 3,
+};
+
+/* Per-socket state (accessed via vsk->trans) */
+struct virtio_vsock_sock {
+   struct vsock_sock *vsk;
+
+   /* Protected by lock_sock(sk_vsock(trans->vsk)) */
+   u32 buf_size;
+   u32 buf_size_min;
+   u32 buf_size_max;
+
+   struct mutex tx_lock;
+   struct mutex rx_lock;
+
+   /* Protected by tx_lock */
+   u32 tx_cnt;
+   u32 buf_alloc;
+   u32 peer_fwd_cnt;
+   u32 peer_buf_alloc;
+
+   /* Protected by rx_lock */
+   u32 fwd_cnt;
+   u32 rx_bytes;
+   struct list_head rx_queue;
+};
+
+struct virtio_vsock_pkt {
+   struct virtio_vsock_hdr hdr;
+   struct work_struct work;
+   struct list_head list;
+   void *buf;
+   u32 len;
+   u32 off;
+};
+
+struct virtio_vsock_pkt_info {
+   u32 remote_cid, remote_port;
+   struct msghdr *msg;
+   u32 pkt_len;
+   u16 type;
+   u16 op;
+   u32 flags;
+};
+
+struct virtio_transport {
+   /* This must be the first field */
+   struct vsock_transport transport;
+
+   /* Send packet for a specific socket */
+   int (*send_pkt)(struct vsock_sock *vsk,
+   struct virtio_vsock_pkt_info *info);
+
+   /* Send p

Re: [PATCH v3 3/4] VSOCK: Introduce vhost-vsock.ko

2015-12-15 Thread Stefan Hajnoczi
On Fri, Dec 11, 2015 at 01:45:29PM +, Alex Bennée wrote:
> > +   if (head == vq->num) {
> > +   if (unlikely(vhost_enable_notify(>dev, vq))) {
> > +   vhost_disable_notify(>dev, vq);
> > +   continue;
> 
> Why are we doing this? If we enable something we then disable it? A
> comment as to what is going on here would be useful.

This is a standard optimization to avoid vmexits that other vhost
devices and QEMU implement too.

When the host begins pulling buffers off a virtqueue it first disables
guest->host notifications.  If the guest adds additional buffers while
the host is processing, the notification (vmexit) is skipped.  The host
re-enables guest->host notifications when it finishes virtqueue
processing.

If the guest added buffers after vhost_get_vq_desc() but before
vhost_enable_notify(), then vhost_enable_notify() returns true and the
host must process the buffers (i.e. restart the loop).  Failure to do so
could result in deadlocks because the guest didn't notify and the host
would be waiting for a notification.

I will add comments to the code.

> > +   vhost_add_used(vq, head, pkt->len); /* TODO should this
> > be sizeof(pkt->hdr) + pkt->len? */
> 
> TODO needs sorting our or removing.

Will fix in the next revision.

> > +   /* Respect global tx buf limitation */
> > +   mutex_lock(>mutex);
> > +   while (pkt_len + vsock->total_tx_buf >
> > VIRTIO_VSOCK_MAX_TX_BUF_SIZE) {
> 
> I'm curious about the relationship between
> VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE above and VIRTIO_VSOCK_MAX_TX_BUF_SIZE
> just here. Why do we need to limit pkt_len to the smaller when really
> all that matters is pkt_len + vsock->total_tx_buf >
> VIRTIO_VSOCK_MAX_TX_BUF_SIZE?

There are two separate issues:

1. The total amount of pending data.  The idea is to stop queuing
   packets and make the caller wait until resources become available so
   that vhost_vsock.ko memory consumption is bounded.

   total_tx_buf len is an artificial limit that is lower than the actual
   virtqueue maximum data size.  Otherwise we could just rely on the
   virtqueue to limit the size but it can be very large.

2. Splitting data into packets that fit into rx virtqueue buffers.  The
   guest sets up the rx virtqueue with VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE
   buffers.  Here, vhost_vsock.ko is assuming that the rx virtqueue
   buffers are always VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE bytes so it
   splits data along this boundary.

   This is ugly because the guest could choose a different buffer size
   and the host has VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE hardcoded.  I'll
   look into eliminating this assumption.

> > +static void vhost_vsock_handle_ctl_kick(struct vhost_work *work)
> > +{
> > +   struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
> > + poll.work);
> > +   struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
> > +dev);
> > +
> > +   pr_debug("%s vq=%p, vsock=%p\n", __func__, vq, vsock);
> > +}
> 
> This doesn't handle anything, it just prints debug stuff. Should this be
> a NOP function?

The control virtqueue is currently not used.  In the next revision this
function will be dropped.

> > +static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 
> > features)
> > +{
> > +   struct vhost_virtqueue *vq;
> > +   int i;
> > +
> > +   if (features & ~VHOST_VSOCK_FEATURES)
> > +   return -EOPNOTSUPP;
> > +
> > +   mutex_lock(>dev.mutex);
> > +   if ((features & (1 << VHOST_F_LOG_ALL)) &&
> > +   !vhost_log_access_ok(>dev)) {
> > +   mutex_unlock(>dev.mutex);
> > +   return -EFAULT;
> > +   }
> > +
> > +   for (i = 0; i < VSOCK_VQ_MAX; i++) {
> > +   vq = >vqs[i].vq;
> > +   mutex_lock(>mutex);
> > +   vq->acked_features = features;
> 
> Is this a user supplied flag? Should it be masked to valid values?

That is already done above where VHOST_VSOCK_FEATURES is checked.


signature.asc
Description: PGP signature


Re: [PATCH v3 4/4] VSOCK: Add Makefile and Kconfig

2015-12-15 Thread Stefan Hajnoczi
On Fri, Dec 11, 2015 at 05:19:08PM +, Alex Bennée wrote:
> > +config VHOST_VSOCK
> > +   tristate "vhost virtio-vsock driver"
> > +   depends on VSOCKETS && EVENTFD
> > +   select VIRTIO_VSOCKETS_COMMON
> > +   select VHOST
> > +   select VHOST_RING
> > +   default n
> > +   ---help---
> > +   Say M here to enable the vhost-vsock for virtio-vsock guests
> 
> I think checkpatch prefers a few more words for the feature but I'm
> happy with it.

I have expanded the description.

> > diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
> > index 14810ab..74e0bc8 100644
> > --- a/net/vmw_vsock/Kconfig
> > +++ b/net/vmw_vsock/Kconfig
> > @@ -26,3 +26,21 @@ config VMWARE_VMCI_VSOCKETS
> >
> >   To compile this driver as a module, choose M here: the module
> >   will be called vmw_vsock_vmci_transport. If unsure, say N.
> > +
> > +config VIRTIO_VSOCKETS
> > +   tristate "virtio transport for Virtual Sockets"
> > +   depends on VSOCKETS && VIRTIO
> > +   select VIRTIO_VSOCKETS_COMMON
> > +   help
> > + This module implements a virtio transport for Virtual Sockets.
> > +
> > + Enable this transport if your Virtual Machine runs on
> >   Qemu/KVM.
> 
> Is this better worded as:
> 
> "Enable this transport if your Virtual Machine host supports vsockets
> over virtio."

Good idea.  Will fix in the next revision.


signature.asc
Description: PGP signature


Re: [PATCH v3 1/4] VSOCK: Introduce virtio-vsock-common.ko

2015-12-10 Thread Stefan Hajnoczi
On Thu, Dec 10, 2015 at 10:17:07AM +, Alex Bennée wrote:
> Stefan Hajnoczi <stefa...@redhat.com> writes:
> 
> > From: Asias He <as...@redhat.com>
> >
> > This module contains the common code and header files for the following
> > virtio-vsock and virtio-vhost kernel modules.
> 
> General comment checkpatch has a bunch of warnings about 80 character
> limits, extra braces and BUG_ON usage.

Will fix in the next verison.

> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > new file mode 100644
> > index 000..e54eb45
> > --- /dev/null
> > +++ b/include/linux/virtio_vsock.h
> > @@ -0,0 +1,203 @@
> > +/*
> > + * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
> > + * anyone can use the definitions to implement compatible
> > drivers/servers:
> 
> Is anything in here actually exposed to userspace or the guest? The
> #ifdef __KERNEL__ statement seems redundant for this file at least.

You are right.  I think the header was copied from a uapi file.

I'll compare against other virtio code and apply an appropriate header.

> > +void virtio_vsock_dumppkt(const char *func,  const struct virtio_vsock_pkt 
> > *pkt)
> > +{
> > +   pr_debug("%s: pkt=%p, op=%d, len=%d, %d:%d---%d:%d, len=%d\n",
> > +func, pkt,
> > +le16_to_cpu(pkt->hdr.op),
> > +le32_to_cpu(pkt->hdr.len),
> > +le32_to_cpu(pkt->hdr.src_cid),
> > +le32_to_cpu(pkt->hdr.src_port),
> > +le32_to_cpu(pkt->hdr.dst_cid),
> > +le32_to_cpu(pkt->hdr.dst_port),
> > +pkt->len);
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_vsock_dumppkt);
> 
> Why export this at all? The only users are in this file so you could
> make it static.

I'll make it static.

> > +u32 virtio_transport_get_credit(struct virtio_transport *trans, u32 credit)
> > +{
> > +   u32 ret;
> > +
> > +   mutex_lock(>tx_lock);
> > +   ret = trans->peer_buf_alloc - (trans->tx_cnt - trans->peer_fwd_cnt);
> > +   if (ret > credit)
> > +   ret = credit;
> > +   trans->tx_cnt += ret;
> > +   mutex_unlock(>tx_lock);
> > +
> > +   pr_debug("%s: ret=%d, buf_alloc=%d, peer_buf_alloc=%d,"
> > +"tx_cnt=%d, fwd_cnt=%d, peer_fwd_cnt=%d\n", __func__,
> 
> I think __func__ is superfluous here as the dynamic print code already
> has it and can print it when required. Having said that there seems to
> be plenty of code already in the kernel that uses __func__ :-/

I'll convert most printks to tracepoints in the next revision.

> > +u64 virtio_transport_get_max_buffer_size(struct vsock_sock *vsk)
> > +{
> > +   struct virtio_transport *trans = vsk->trans;
> > +
> > +   return trans->buf_size_max;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_transport_get_max_buffer_size);
> 
> All these accesses functions seem pretty simple. Maybe they should be
> inline header functions or even #define macros?

They are used as struct vsock_transport function pointers.  What is the
advantage to inlining them?

> > +int virtio_transport_notify_send_post_enqueue(struct vsock_sock *vsk,
> > +   ssize_t written, struct vsock_transport_send_notify_data *data)
> > +{
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_transport_notify_send_post_enqueue);
> 
> This makes me wonder if the calling code should be having
> if(transport->fn) checks rather than filling stuff out will null
> implementations but I guess that's a question better aimed at the
> maintainers.

I've considered it too.  I'll try to streamline this in the next
revision.

> > +/* We are under the virtio-vsock's vsock->rx_lock or
> > + * vhost-vsock's vq->mutex lock */
> > +void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
> > +{
> > +   struct virtio_transport *trans;
> > +   struct sockaddr_vm src, dst;
> > +   struct vsock_sock *vsk;
> > +   struct sock *sk;
> > +
> > +   vsock_addr_init(, le32_to_cpu(pkt->hdr.src_cid), 
> > le32_to_cpu(pkt->hdr.src_port));
> > +   vsock_addr_init(, le32_to_cpu(pkt->hdr.dst_cid), 
> > le32_to_cpu(pkt->hdr.dst_port));
> > +
> > +   virtio_vsock_dumppkt(__func__, pkt);
> > +
> > +   if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
> > +   /* TODO send RST */
> 
> TODO's shouldn't make it into final submissions.
> 
> > +   goto free_pkt;
> > +   }
> > +
> > +   /* The socket must be in connected or bound table
> > +* otherwise send reset back
> > +*/
> > +   sk = vsock_find_connected_socket(, );
> > +   if (!sk) {
> > +   sk = vsock_find_bound_socket();
> > +   if (!sk) {
> > +   pr_debug("%s: can not find bound_socket\n", __func__);
> > +   virtio_vsock_dumppkt(__func__, pkt);
> > +   /* Ignore this pkt instead of sending reset back */
> > +   /* TODO send a RST unless this packet is a RST
> > (to avoid infinite loops) */
> 
> Ditto.

Thanks, I'll complete the RST code in the next revision.


signature.asc
Description: PGP signature


Re: [PATCH v3 2/4] VSOCK: Introduce virtio-vsock.ko

2015-12-10 Thread Stefan Hajnoczi
On Thu, Dec 10, 2015 at 09:23:25PM +, Alex Bennée wrote:
> Stefan Hajnoczi <stefa...@redhat.com> writes:
> 
> > From: Asias He <as...@redhat.com>
> >
> > VM sockets virtio transport implementation. This module runs in guest
> > kernel.
> 
> checkpatch warns on a bunch of whitespace/tab issues.

Will fix in the next version.

> > +struct virtio_vsock {
> > +   /* Virtio device */
> > +   struct virtio_device *vdev;
> > +   /* Virtio virtqueue */
> > +   struct virtqueue *vqs[VSOCK_VQ_MAX];
> > +   /* Wait queue for send pkt */
> > +   wait_queue_head_t queue_wait;
> > +   /* Work item to send pkt */
> > +   struct work_struct tx_work;
> > +   /* Work item to recv pkt */
> > +   struct work_struct rx_work;
> > +   /* Mutex to protect send pkt*/
> > +   struct mutex tx_lock;
> > +   /* Mutex to protect recv pkt*/
> > +   struct mutex rx_lock;
> 
> Further down I got confused by what lock was what and exactly what was
> being protected. If the receive and transmit paths touch separate things
> it might be worth re-arranging the structure to make it clearer, eg:
> 
>/* The transmit path is protected by tx_lock */
>struct mutex tx_lock;
>struct work_struct tx_work;
>..
>..
> 
>/* The receive path is protected by rx_lock */
>wait_queue_head_t queue_wait;
>..
>..
> 
>  Which might make things a little clearer. Then all the redundant
>  information in the comments can be removed. I don't need to know what
>  is a Virtio device, virtqueue or wait_queue etc as they are implicit in
>  the structure name.

Thanks, that is a nice idea.

> > +   mutex_lock(>tx_lock);
> > +   while ((ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt,
> > +   GFP_KERNEL)) < 0) {
> > +   prepare_to_wait_exclusive(>queue_wait, ,
> > + TASK_UNINTERRUPTIBLE);
> > +   mutex_unlock(>tx_lock);
> > +   schedule();
> > +   mutex_lock(>tx_lock);
> > +   finish_wait(>queue_wait, );
> > +   }
> > +   virtqueue_kick(vq);
> > +   mutex_unlock(>tx_lock);
> 
> What are we protecting with tx_lock here? See comments above about
> making the lock usage semantics clearer.

vq (vsock->vqs[VSOCK_VQ_TX]) is being protected.  Concurrent calls to
virtqueue_add_sgs() are not allowed.

> > +
> > +   return pkt_len;
> > +}
> > +
> > +static struct virtio_transport_pkt_ops virtio_ops = {
> > +   .send_pkt = virtio_transport_send_pkt,
> > +};
> > +
> > +static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> > +{
> > +   int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > +   struct virtio_vsock_pkt *pkt;
> > +   struct scatterlist hdr, buf, *sgs[2];
> > +   struct virtqueue *vq;
> > +   int ret;
> > +
> > +   vq = vsock->vqs[VSOCK_VQ_RX];
> > +
> > +   do {
> > +   pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
> > +   if (!pkt) {
> > +   pr_debug("%s: fail to allocate pkt\n", __func__);
> > +   goto out;
> > +   }
> > +
> > +   /* TODO: use mergeable rx buffer */
> 
> TODO's should end up in merged code.

Will fix in next revision.

> > +   pkt->buf = kmalloc(buf_len, GFP_KERNEL);
> > +   if (!pkt->buf) {
> > +   pr_debug("%s: fail to allocate pkt->buf\n", __func__);
> > +   goto err;
> > +   }
> > +
> > +   sg_init_one(, >hdr, sizeof(pkt->hdr));
> > +   sgs[0] = 
> > +
> > +   sg_init_one(, pkt->buf, buf_len);
> > +   sgs[1] = 
> > +   ret = virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL);
> > +   if (ret)
> > +   goto err;
> > +   vsock->rx_buf_nr++;
> > +   } while (vq->num_free);
> > +   if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
> > +   vsock->rx_buf_max_nr = vsock->rx_buf_nr;
> > +out:
> > +   virtqueue_kick(vq);
> > +   return;
> > +err:
> > +   virtqueue_kick(vq);
> > +   virtio_transport_free_pkt(pkt);
> 
> You could free the pkt memory at the fail site and just have one exit path.

Okay, I agree the err label is of marginal use.  Let's get rid of it.

> 
> > +   return;
> > +}
> > +
> > +static void virtio_transport_send_pkt_work(struct work_struct *work)
> > +{
> > +   struct virtio_vsock *vsock =
> &g

[PATCH v3 1/4] VSOCK: Introduce virtio-vsock-common.ko

2015-12-09 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

This module contains the common code and header files for the following
virtio-vsock and virtio-vhost kernel modules.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v3:
 * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
   of REQUEST/RESPONSE/ACK
 * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
 * Only allow host->guest connections (same security model as latest
   VMware)
v2:
 * Fix peer_buf_alloc inheritance on child socket
 * Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
---
 include/linux/virtio_vsock.h| 203 
 include/uapi/linux/virtio_ids.h |   1 +
 include/uapi/linux/virtio_vsock.h   |  87 
 net/vmw_vsock/virtio_transport_common.c | 854 
 4 files changed, 1145 insertions(+)
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
new file mode 100644
index 000..e54eb45
--- /dev/null
+++ b/include/linux/virtio_vsock.h
@@ -0,0 +1,203 @@
+/*
+ * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
+ * anyone can use the definitions to implement compatible drivers/servers:
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Copyright (C) Red Hat, Inc., 2013-2015
+ * Copyright (C) Asias He <as...@redhat.com>, 2013
+ * Copyright (C) Stefan Hajnoczi <stefa...@redhat.com>, 2015
+ */
+
+#ifndef _LINUX_VIRTIO_VSOCK_H
+#define _LINUX_VIRTIO_VSOCK_H
+
+#include 
+#include 
+#include 
+
+#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE  128
+#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE   (1024 * 4)
+#define VIRTIO_VSOCK_MAX_BUF_SIZE  0xUL
+#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE  (1024 * 64)
+#define VIRTIO_VSOCK_MAX_TX_BUF_SIZE   (1024 * 1024 * 16)
+#define VIRTIO_VSOCK_MAX_DGRAM_SIZE(1024 * 64)
+
+struct vsock_transport_recv_notify_data;
+struct vsock_transport_send_notify_data;
+struct sockaddr_vm;
+struct vsock_sock;
+
+enum {
+   VSOCK_VQ_CTRL   = 0,
+   VSOCK_VQ_RX = 1, /* for host to guest data */
+   VSOCK_VQ_TX = 2, /* for guest to host data */
+   VSOCK_VQ_MAX= 3,
+};
+
+/* virtio transport socket state */
+struct virtio_transport {
+   struct virtio_transport_pkt_ops *ops;
+   struct vsock_sock *vsk;
+
+   u32 buf_size;
+   u32 buf_size_min;
+   u32 buf_size_max;
+
+   struct mutex tx_lock;
+   struct mutex rx_lock;
+
+   struct list_head rx_queue;
+   u32 rx_bytes;
+
+   /* Protected by trans->tx_lock */
+   u32 tx_cnt;
+   u32 buf_alloc;
+   u32 peer_fwd_cnt;
+   u32 peer_buf_alloc;
+   /* Protected by trans->rx_lock */
+   u32 fwd_cnt;
+};
+
+struct virtio_vsock_pkt {
+   struct virtio_vsock_hdr hdr;
+   struct virtio_transport *trans;
+   struct work_struct work;
+   struct list_head list;
+   void *buf;
+   u32 len;
+   u32 o

[PATCH v3 0/4] Add virtio transport for AF_VSOCK

2015-12-09 Thread Stefan Hajnoczi
Note: the virtio-vsock device specification is currently under review but not
yet finalized.  Please review this code but don't merge until I send an update
when the spec is finalized.  Thanks!

v3:
 * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
   of REQUEST/RESPONSE/ACK
 * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
   (also drop v2 Patch 1, it's only needed for SOCK_DGRAM)
 * Only allow host->guest connections (same security model as latest
   VMware)
 * Don't put vhost vsock driver into staging
 * Add missing Kconfig dependencies (Arnd Bergmann )
 * Remove unneeded variable used to store return value
   (Fengguang Wu  and Julia Lawall
   )

v2:
 * Rebased onto Linux v4.4-rc2
 * vhost: Refuse to assign reserved CIDs
 * vhost: Refuse guest CID if already in use
 * vhost: Only accept correctly addressed packets (no spoofing!)
 * vhost: Support flexible rx/tx descriptor layout
 * vhost: Add missing total_tx_buf decrement
 * virtio_transport: Fix total_tx_buf accounting
 * virtio_transport: Add virtio_transport global mutex to prevent races
 * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
 * common: Fix peer_buf_alloc inheritance on child socket

This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors.  It is currently only implemented for VMware's VMCI transport.

This series implements the proposed virtio-vsock device specification from
here:
http://permalink.gmane.org/gmane.comp.emulators.virtio.devel/980

Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
picked up the series again.

The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock

Why virtio-vsock?
-
Guest<->host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.

virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM semantics.  Applications on the host can easily connect to guest
agents because the sockets API allows multiple connections to a listen socket
(unlike virtio-serial).  This simplifies the guest<->host communication and
eliminates the need for extra processes on the host to arbitrate virtio-serial
ports.

Overview

This series adds 3 pieces:

1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko

2. virtio_transport.ko - guest driver

3. drivers/vhost/vsock.ko - host driver

Howto
-
The following kernel options are needed:
  CONFIG_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS_COMMON=y
  CONFIG_VHOST_VSOCK=m

Launch QEMU as follows:
  # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3

Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
is 2 and the guest must be assigned a CID (3 in the example above).

Status
--
This patch series implements the latest draft specification.  Please review.

Asias He (4):
  VSOCK: Introduce virtio-vsock-common.ko
  VSOCK: Introduce virtio-vsock.ko
  VSOCK: Introduce vhost-vsock.ko
  VSOCK: Add Makefile and Kconfig

 drivers/vhost/Kconfig   |  10 +
 drivers/vhost/Makefile  |   4 +
 drivers/vhost/vsock.c   | 628 +++
 drivers/vhost/vsock.h   |   4 +
 include/linux/virtio_vsock.h| 203 
 include/uapi/linux/virtio_ids.h |   1 +
 include/uapi/linux/virtio_vsock.h   |  87 
 net/vmw_vsock/Kconfig   |  18 +
 net/vmw_vsock/Makefile  |   2 +
 net/vmw_vsock/virtio_transport.c| 466 +
 net/vmw_vsock/virtio_transport_common.c | 854 
 11 files changed, 2277 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport.c
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/4] VSOCK: Add Makefile and Kconfig

2015-12-09 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v3:
 * Don't put vhost vsock driver into staging
 * Add missing Kconfig dependencies (Arnd Bergmann <a...@arndb.de>)
---
 drivers/vhost/Kconfig  | 10 ++
 drivers/vhost/Makefile |  4 
 net/vmw_vsock/Kconfig  | 18 ++
 net/vmw_vsock/Makefile |  2 ++
 4 files changed, 34 insertions(+)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..a1bb4c2 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -21,6 +21,16 @@ config VHOST_SCSI
Say M here to enable the vhost_scsi TCM fabric module
for use with virtio-scsi guests
 
+config VHOST_VSOCK
+   tristate "vhost virtio-vsock driver"
+   depends on VSOCKETS && EVENTFD
+   select VIRTIO_VSOCKETS_COMMON
+   select VHOST
+   select VHOST_RING
+   default n
+   ---help---
+   Say M here to enable the vhost-vsock for virtio-vsock guests
+
 config VHOST_RING
tristate
---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index e0441c3..6b012b9 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,5 +4,9 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
+obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
+vhost_vsock-y := vsock.o
+
 obj-$(CONFIG_VHOST_RING) += vringh.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 14810ab..74e0bc8 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -26,3 +26,21 @@ config VMWARE_VMCI_VSOCKETS
 
  To compile this driver as a module, choose M here: the module
  will be called vmw_vsock_vmci_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS
+   tristate "virtio transport for Virtual Sockets"
+   depends on VSOCKETS && VIRTIO
+   select VIRTIO_VSOCKETS_COMMON
+   help
+ This module implements a virtio transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine runs on Qemu/KVM.
+
+ To compile this driver as a module, choose M here: the module
+ will be called virtio_vsock_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS_COMMON
+   tristate
+   ---help---
+ This option is selected by any driver which needs to access
+ the virtio_vsock.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 2ce52d7..cf4c294 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -1,5 +1,7 @@
 obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS) += virtio_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += virtio_transport_common.o
 
 vsock-y += af_vsock.o vsock_addr.o
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/4] VSOCK: Introduce vhost-vsock.ko

2015-12-09 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

VM sockets vhost transport implementation. This module runs in host
kernel.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v3:
 * Remove unneeded variable used to store return value
   (Fengguang Wu <fengguang...@intel.com> and Julia Lawall
   <julia.law...@lip6.fr>)
v2:
 * Add missing total_tx_buf decrement
 * Support flexible rx/tx descriptor layout
 * Refuse to assign reserved CIDs
 * Refuse guest CID if already in use
 * Only accept correctly addressed packets
---
 drivers/vhost/vsock.c | 628 ++
 drivers/vhost/vsock.h |   4 +
 2 files changed, 632 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
new file mode 100644
index 000..3c0034a
--- /dev/null
+++ b/drivers/vhost/vsock.c
@@ -0,0 +1,628 @@
+/*
+ * vhost transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He <as...@redhat.com>
+ * Stefan Hajnoczi <stefa...@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "vhost.h"
+#include "vsock.h"
+
+#define VHOST_VSOCK_DEFAULT_HOST_CID   2
+
+static int vhost_transport_socket_init(struct vsock_sock *vsk,
+  struct vsock_sock *psk);
+
+enum {
+   VHOST_VSOCK_FEATURES = VHOST_FEATURES,
+};
+
+/* Used to track all the vhost_vsock instances on the system. */
+static LIST_HEAD(vhost_vsock_list);
+static DEFINE_MUTEX(vhost_vsock_mutex);
+
+struct vhost_vsock_virtqueue {
+   struct vhost_virtqueue vq;
+};
+
+struct vhost_vsock {
+   /* Vhost device */
+   struct vhost_dev dev;
+   /* Vhost vsock virtqueue*/
+   struct vhost_vsock_virtqueue vqs[VSOCK_VQ_MAX];
+   /* Link to global vhost_vsock_list*/
+   struct list_head list;
+   /* Head for pkt from host to guest */
+   struct list_head send_pkt_list;
+   /* Work item to send pkt */
+   struct vhost_work send_pkt_work;
+   /* Wait queue for send pkt */
+   wait_queue_head_t queue_wait;
+   /* Used for global tx buf limitation */
+   u32 total_tx_buf;
+   /* Guest contex id this vhost_vsock instance handles */
+   u32 guest_cid;
+};
+
+static u32 vhost_transport_get_local_cid(void)
+{
+   return VHOST_VSOCK_DEFAULT_HOST_CID;
+}
+
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+   struct vhost_vsock *vsock;
+
+   mutex_lock(_vsock_mutex);
+   list_for_each_entry(vsock, _vsock_list, list) {
+   if (vsock->guest_cid == guest_cid) {
+   mutex_unlock(_vsock_mutex);
+   return vsock;
+   }
+   }
+   mutex_unlock(_vsock_mutex);
+
+   return NULL;
+}
+
+static void
+vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
+   struct vhost_virtqueue *vq)
+{
+   bool added = false;
+
+   mutex_lock(>mutex);
+   vhost_disable_notify(>dev, vq);
+   for (;;) {
+   struct virtio_vsock_pkt *pkt;
+   struct iov_iter iov_iter;
+   unsigned out, in;
+   struct sock *sk;
+   size_t nbytes;
+   size_t len;
+   int head;
+
+   if (list_empty(>send_pkt_list)) {
+   vhost_enable_notify(>dev, vq);
+   break;
+   }
+
+   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+, , NULL, NULL);
+   pr_debug("%s: head = %d\n", __func__, head);
+   if (head < 0)
+   break;
+
+   if (head == vq->num) {
+   if (unlikely(vhost_enable_notify(>dev, vq))) {
+   vhost_disable_notify(>dev, vq);
+   continue;
+   }
+   break;
+   }
+
+   pkt = list_first_entry(>send_pkt_list,
+  struct virtio_vsock_pkt, list);
+   list_del_init(>list);
+
+   if (out) {
+   virtio_transport_free_pkt(pkt);
+   vq_err(vq, "Expected 0 output buffers, got %u\n", out);
+   break;
+   }
+
+   len = iov_length(>iov[out], in);
+   iov_iter_init(_iter, READ, >iov[out], in, len);
+
+   nbytes = copy_to_iter(>hdr, sizeof(pkt->hdr), _iter);
+   if (nbytes != sizeof(pkt->hdr)) {
+   virtio_transport_free_pkt(pkt);
+   vq_err(vq, &quo

[PATCH v3 2/4] VSOCK: Introduce virtio-vsock.ko

2015-12-09 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

VM sockets virtio transport implementation. This module runs in guest
kernel.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v2:
 * Fix total_tx_buf accounting
 * Add virtio_transport global mutex to prevent races
---
 net/vmw_vsock/virtio_transport.c | 466 +++
 1 file changed, 466 insertions(+)
 create mode 100644 net/vmw_vsock/virtio_transport.c

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
new file mode 100644
index 000..df65dca
--- /dev/null
+++ b/net/vmw_vsock/virtio_transport.c
@@ -0,0 +1,466 @@
+/*
+ * virtio transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He <as...@redhat.com>
+ * Stefan Hajnoczi <stefa...@redhat.com>
+ *
+ * Some of the code is take from Gerd Hoffmann <kra...@redhat.com>'s
+ * early virtio-vsock proof-of-concept bits.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct workqueue_struct *virtio_vsock_workqueue;
+static struct virtio_vsock *the_virtio_vsock;
+static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock);
+
+struct virtio_vsock {
+   /* Virtio device */
+   struct virtio_device *vdev;
+   /* Virtio virtqueue */
+   struct virtqueue *vqs[VSOCK_VQ_MAX];
+   /* Wait queue for send pkt */
+   wait_queue_head_t queue_wait;
+   /* Work item to send pkt */
+   struct work_struct tx_work;
+   /* Work item to recv pkt */
+   struct work_struct rx_work;
+   /* Mutex to protect send pkt*/
+   struct mutex tx_lock;
+   /* Mutex to protect recv pkt*/
+   struct mutex rx_lock;
+   /* Number of recv buffers */
+   int rx_buf_nr;
+   /* Number of max recv buffers */
+   int rx_buf_max_nr;
+   /* Used for global tx buf limitation */
+   u32 total_tx_buf;
+   /* Guest context id, just like guest ip address */
+   u32 guest_cid;
+};
+
+static struct virtio_vsock *virtio_vsock_get(void)
+{
+   return the_virtio_vsock;
+}
+
+static u32 virtio_transport_get_local_cid(void)
+{
+   struct virtio_vsock *vsock = virtio_vsock_get();
+
+   return vsock->guest_cid;
+}
+
+static int
+virtio_transport_send_pkt(struct vsock_sock *vsk,
+ struct virtio_vsock_pkt_info *info)
+{
+   u32 src_cid, src_port, dst_cid, dst_port;
+   int ret, in_sg = 0, out_sg = 0;
+   struct virtio_transport *trans;
+   struct virtio_vsock_pkt *pkt;
+   struct virtio_vsock *vsock;
+   struct scatterlist hdr, buf, *sgs[2];
+   struct virtqueue *vq;
+   u32 pkt_len = info->pkt_len;
+   DEFINE_WAIT(wait);
+
+   vsock = virtio_vsock_get();
+   if (!vsock)
+   return -ENODEV;
+
+   src_cid = virtio_transport_get_local_cid();
+   src_port = vsk->local_addr.svm_port;
+   if (!info->remote_cid) {
+   dst_cid = vsk->remote_addr.svm_cid;
+   dst_port = vsk->remote_addr.svm_port;
+   } else {
+   dst_cid = info->remote_cid;
+   dst_port = info->remote_port;
+   }
+
+   trans = vsk->trans;
+   vq = vsock->vqs[VSOCK_VQ_TX];
+
+   if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
+   pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+   pkt_len = virtio_transport_get_credit(trans, pkt_len);
+   /* Do not send zero length OP_RW pkt*/
+   if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
+   return pkt_len;
+
+   /* Respect global tx buf limitation */
+   mutex_lock(>tx_lock);
+   while (pkt_len + vsock->total_tx_buf > VIRTIO_VSOCK_MAX_TX_BUF_SIZE) {
+   prepare_to_wait_exclusive(>queue_wait, ,
+ TASK_UNINTERRUPTIBLE);
+   mutex_unlock(>tx_lock);
+   schedule();
+   mutex_lock(>tx_lock);
+   finish_wait(>queue_wait, );
+   }
+   vsock->total_tx_buf += pkt_len;
+   mutex_unlock(>tx_lock);
+
+   pkt = virtio_transport_alloc_pkt(vsk, info, pkt_len,
+src_cid, src_port,
+dst_cid, dst_port);
+   if (!pkt) {
+   mutex_lock(>tx_lock);
+   vsock->total_tx_buf -= pkt_len;
+   mutex_unlock(>tx_lock);
+   virtio_transport_put_credit(trans, pkt_len);
+   return -ENOMEM;
+   }
+
+   pr_debug("%s:info->pkt_len= %d\n", __func__, info->pkt_len);
+
+   /* Will be released in virtio_transport_send_pkt_work */
+   sock_hold(>v

[PATCH 3/6] Revert "VSOCK: Introduce vhost-vsock.ko"

2015-12-08 Thread Stefan Hajnoczi
This reverts commit 98bb892821c1ad3781b8c7daec2fc8a8de3390c9.

Keep virtio-vsock out-of-tree until the device specification is
finalized.

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 drivers/vhost/vsock.c | 631 --
 drivers/vhost/vsock.h |   4 -
 2 files changed, 635 deletions(-)
 delete mode 100644 drivers/vhost/vsock.c
 delete mode 100644 drivers/vhost/vsock.h

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
deleted file mode 100644
index 65b1cf8..000
--- a/drivers/vhost/vsock.c
+++ /dev/null
@@ -1,631 +0,0 @@
-/*
- * vhost transport for vsock
- *
- * Copyright (C) 2013-2015 Red Hat, Inc.
- * Author: Asias He <as...@redhat.com>
- *     Stefan Hajnoczi <stefa...@redhat.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2.
- */
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include "vhost.h"
-#include "vsock.h"
-
-#define VHOST_VSOCK_DEFAULT_HOST_CID   2
-
-static int vhost_transport_socket_init(struct vsock_sock *vsk,
-  struct vsock_sock *psk);
-
-enum {
-   VHOST_VSOCK_FEATURES = VHOST_FEATURES,
-};
-
-/* Used to track all the vhost_vsock instances on the system. */
-static LIST_HEAD(vhost_vsock_list);
-static DEFINE_MUTEX(vhost_vsock_mutex);
-
-struct vhost_vsock_virtqueue {
-   struct vhost_virtqueue vq;
-};
-
-struct vhost_vsock {
-   /* Vhost device */
-   struct vhost_dev dev;
-   /* Vhost vsock virtqueue*/
-   struct vhost_vsock_virtqueue vqs[VSOCK_VQ_MAX];
-   /* Link to global vhost_vsock_list*/
-   struct list_head list;
-   /* Head for pkt from host to guest */
-   struct list_head send_pkt_list;
-   /* Work item to send pkt */
-   struct vhost_work send_pkt_work;
-   /* Wait queue for send pkt */
-   wait_queue_head_t queue_wait;
-   /* Used for global tx buf limitation */
-   u32 total_tx_buf;
-   /* Guest contex id this vhost_vsock instance handles */
-   u32 guest_cid;
-};
-
-static u32 vhost_transport_get_local_cid(void)
-{
-   u32 cid = VHOST_VSOCK_DEFAULT_HOST_CID;
-   return cid;
-}
-
-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
-{
-   struct vhost_vsock *vsock;
-
-   mutex_lock(_vsock_mutex);
-   list_for_each_entry(vsock, _vsock_list, list) {
-   if (vsock->guest_cid == guest_cid) {
-   mutex_unlock(_vsock_mutex);
-   return vsock;
-   }
-   }
-   mutex_unlock(_vsock_mutex);
-
-   return NULL;
-}
-
-static void
-vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
-   struct vhost_virtqueue *vq)
-{
-   bool added = false;
-
-   mutex_lock(>mutex);
-   vhost_disable_notify(>dev, vq);
-   for (;;) {
-   struct virtio_vsock_pkt *pkt;
-   struct iov_iter iov_iter;
-   unsigned out, in;
-   struct sock *sk;
-   size_t nbytes;
-   size_t len;
-   int head;
-
-   if (list_empty(>send_pkt_list)) {
-   vhost_enable_notify(>dev, vq);
-   break;
-   }
-
-   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-, , NULL, NULL);
-   pr_debug("%s: head = %d\n", __func__, head);
-   if (head < 0)
-   break;
-
-   if (head == vq->num) {
-   if (unlikely(vhost_enable_notify(>dev, vq))) {
-   vhost_disable_notify(>dev, vq);
-   continue;
-   }
-   break;
-   }
-
-   pkt = list_first_entry(>send_pkt_list,
-  struct virtio_vsock_pkt, list);
-   list_del_init(>list);
-
-   if (out) {
-   virtio_transport_free_pkt(pkt);
-   vq_err(vq, "Expected 0 output buffers, got %u\n", out);
-   break;
-   }
-
-   len = iov_length(>iov[out], in);
-   iov_iter_init(_iter, READ, >iov[out], in, len);
-
-   nbytes = copy_to_iter(>hdr, sizeof(pkt->hdr), _iter);
-   if (nbytes != sizeof(pkt->hdr)) {
-   virtio_transport_free_pkt(pkt);
-   vq_err(vq, "Faulted on copying pkt hdr\n");
-   break;
-   }
-
-   nbytes = copy_to_iter(pkt->buf, pkt->len, _iter);
-   if (nbytes != pkt->len) {
-   virtio_transport_free_pkt(pkt);
-   vq_err(vq, "Faulted on copying pkt buf\n");
-

[PATCH 2/6] Revert "VSOCK: Add Makefile and Kconfig"

2015-12-08 Thread Stefan Hajnoczi
This reverts commit 8a2a2029893b4c35d1aba2932111a1a164b9c948.

Keep virtio-vsock out-of-tree until the device specification is
finalized.

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 drivers/vhost/Kconfig   |  4 
 drivers/vhost/Kconfig.vsock |  7 ---
 drivers/vhost/Makefile  |  4 
 net/vmw_vsock/Kconfig   | 18 --
 net/vmw_vsock/Makefile  |  2 --
 5 files changed, 35 deletions(-)
 delete mode 100644 drivers/vhost/Kconfig.vsock

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 81449bf..533eaf0 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -47,7 +47,3 @@ config VHOST_CROSS_ENDIAN_LEGACY
  adds some overhead, it is disabled by default.
 
  If unsure, say "N".
-
-if STAGING
-source "drivers/vhost/Kconfig.vsock"
-endif
diff --git a/drivers/vhost/Kconfig.vsock b/drivers/vhost/Kconfig.vsock
deleted file mode 100644
index 3491865..000
--- a/drivers/vhost/Kconfig.vsock
+++ /dev/null
@@ -1,7 +0,0 @@
-config VHOST_VSOCK
-   tristate "vhost virtio-vsock driver"
-   depends on VSOCKETS && EVENTFD
-   select VIRTIO_VSOCKETS_COMMON
-   default n
-   ---help---
-   Say M here to enable the vhost-vsock for virtio-vsock guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6b012b9..e0441c3 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,9 +4,5 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
-obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
-vhost_vsock-y := vsock.o
-
 obj-$(CONFIG_VHOST_RING) += vringh.o
-
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 74e0bc8..14810ab 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -26,21 +26,3 @@ config VMWARE_VMCI_VSOCKETS
 
  To compile this driver as a module, choose M here: the module
  will be called vmw_vsock_vmci_transport. If unsure, say N.
-
-config VIRTIO_VSOCKETS
-   tristate "virtio transport for Virtual Sockets"
-   depends on VSOCKETS && VIRTIO
-   select VIRTIO_VSOCKETS_COMMON
-   help
- This module implements a virtio transport for Virtual Sockets.
-
- Enable this transport if your Virtual Machine runs on Qemu/KVM.
-
- To compile this driver as a module, choose M here: the module
- will be called virtio_vsock_transport. If unsure, say N.
-
-config VIRTIO_VSOCKETS_COMMON
-   tristate
-   ---help---
- This option is selected by any driver which needs to access
- the virtio_vsock.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index cf4c294..2ce52d7 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -1,7 +1,5 @@
 obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
-obj-$(CONFIG_VIRTIO_VSOCKETS) += virtio_transport.o
-obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += virtio_transport_common.o
 
 vsock-y += af_vsock.o vsock_addr.o
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] Revert "VSOCK: Introduce virtio-vsock-common.ko"

2015-12-08 Thread Stefan Hajnoczi
This reverts commit 80a19e338d458abb5a700df3fd00795c51361f06.

Keep virtio-vsock out-of-tree until the device specification is
finalized.

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 include/linux/virtio_vsock.h|  209 -
 include/uapi/linux/virtio_ids.h |1 -
 include/uapi/linux/virtio_vsock.h   |   89 ---
 net/vmw_vsock/virtio_transport_common.c | 1272 ---
 4 files changed, 1571 deletions(-)
 delete mode 100644 include/linux/virtio_vsock.h
 delete mode 100644 include/uapi/linux/virtio_vsock.h
 delete mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
deleted file mode 100644
index a5f3ecc..000
--- a/include/linux/virtio_vsock.h
+++ /dev/null
@@ -1,209 +0,0 @@
-/*
- * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
- * anyone can use the definitions to implement compatible drivers/servers:
- *
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- *notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *notice, this list of conditions and the following disclaimer in the
- *documentation and/or other materials provided with the distribution.
- * 3. Neither the name of IBM nor the names of its contributors
- *may be used to endorse or promote products derived from this software
- *without specific prior written permission.
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
IS''
- * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
- * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- *
- * Copyright (C) Red Hat, Inc., 2013-2015
- * Copyright (C) Asias He <as...@redhat.com>, 2013
- * Copyright (C) Stefan Hajnoczi <stefa...@redhat.com>, 2015
- */
-
-#ifndef _LINUX_VIRTIO_VSOCK_H
-#define _LINUX_VIRTIO_VSOCK_H
-
-#include 
-#include 
-#include 
-
-#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE  128
-#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE  (1024 * 256)
-#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE  (1024 * 256)
-#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE   (1024 * 4)
-#define VIRTIO_VSOCK_MAX_BUF_SIZE  0xUL
-#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE  (1024 * 64)
-#define VIRTIO_VSOCK_MAX_TX_BUF_SIZE   (1024 * 1024 * 16)
-#define VIRTIO_VSOCK_MAX_DGRAM_SIZE(1024 * 64)
-
-struct vsock_transport_recv_notify_data;
-struct vsock_transport_send_notify_data;
-struct sockaddr_vm;
-struct vsock_sock;
-
-enum {
-   VSOCK_VQ_CTRL   = 0,
-   VSOCK_VQ_RX = 1, /* for host to guest data */
-   VSOCK_VQ_TX = 2, /* for guest to host data */
-   VSOCK_VQ_MAX= 3,
-};
-
-/* virtio transport socket state */
-struct virtio_transport {
-   struct virtio_transport_pkt_ops *ops;
-   struct vsock_sock *vsk;
-
-   u32 buf_size;
-   u32 buf_size_min;
-   u32 buf_size_max;
-
-   struct mutex tx_lock;
-   struct mutex rx_lock;
-
-   struct list_head rx_queue;
-   u32 rx_bytes;
-
-   /* Protected by trans->tx_lock */
-   u32 tx_cnt;
-   u32 buf_alloc;
-   u32 peer_fwd_cnt;
-   u32 peer_buf_alloc;
-   /* Protected by trans->rx_lock */
-   u32 fwd_cnt;
-
-   /* Protected by sk_lock */
-   u16 dgram_id;
-   struct list_head incomplete_dgrams; /* dgram fragments */
-};
-
-struct virtio_vsock_pkt {
-   struct virtio_vsock_hdr hdr;
-   struct virtio_transport *trans;
-   struct work_struct work;
-   struct list_head list;
-   void *buf;
-   u32 len;
-   u32 off;
-};
-
-struct virtio_vsock_pkt_info {
-   u32 remote_cid, remote_port;
-   struct msghdr *msg;
-   u32 pkt_len;
-   u16 type;
-   u16 op;
-   u32 flags;
-   u16 dgram_id;
-   u16 dgram_len;
-};
-
-struct virtio_transport_pkt_ops {
-   int (*send_pkt)(struct vsock_sock *vsk,
-   struct virtio_vsock_pkt_info *info);
-};
-
-void virtio_vsock_dumppkt(const char *func,
- const struct virtio_vsock_pkt *pkt);
-
-struct sock *
-virtio_transport_get_pendi

[PATCH 6/6] Revert "VSOCK: Introduce vsock_find_unbound_socket and vsock_bind_dgram_generic"

2015-12-08 Thread Stefan Hajnoczi
This reverts commit 357ab2234d57f6c74386f64ded42dff8e3c0500b.

Keep virtio-vsock out-of-tree until the device specification is
finalized.

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 include/net/af_vsock.h   |  2 --
 net/vmw_vsock/af_vsock.c | 70 
 2 files changed, 72 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index a0c8fa2..e9eb2d6 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -175,10 +175,8 @@ void vsock_insert_connected(struct vsock_sock *vsk);
 void vsock_remove_bound(struct vsock_sock *vsk);
 void vsock_remove_connected(struct vsock_sock *vsk);
 struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
-struct sock *vsock_find_unbound_socket(struct sockaddr_vm *addr);
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
-int vsock_bind_dgram_generic(struct vsock_sock *vsk, struct sockaddr_vm *addr);
 
 #endif /* __AF_VSOCK_H__ */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 77247a2..7fd1220 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -223,17 +223,6 @@ static struct sock *__vsock_find_bound_socket(struct 
sockaddr_vm *addr)
return NULL;
 }
 
-static struct sock *__vsock_find_unbound_socket(struct sockaddr_vm *addr)
-{
-   struct vsock_sock *vsk;
-
-   list_for_each_entry(vsk, vsock_unbound_sockets, bound_table)
-   if (addr->svm_port == vsk->local_addr.svm_port)
-   return sk_vsock(vsk);
-
-   return NULL;
-}
-
 static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
  struct sockaddr_vm *dst)
 {
@@ -309,21 +298,6 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm 
*addr)
 }
 EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
 
-struct sock *vsock_find_unbound_socket(struct sockaddr_vm *addr)
-{
-   struct sock *sk;
-
-   spin_lock_bh(_table_lock);
-   sk = __vsock_find_unbound_socket(addr);
-   if (sk)
-   sock_hold(sk);
-
-   spin_unlock_bh(_table_lock);
-
-   return sk;
-}
-EXPORT_SYMBOL_GPL(vsock_find_unbound_socket);
-
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst)
 {
@@ -558,50 +532,6 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
return 0;
 }
 
-int vsock_bind_dgram_generic(struct vsock_sock *vsk, struct sockaddr_vm *addr)
-{
-   static u32 port = LAST_RESERVED_PORT + 1;
-   struct sockaddr_vm new_addr;
-
-   vsock_addr_init(_addr, addr->svm_cid, addr->svm_port);
-
-   if (addr->svm_port == VMADDR_PORT_ANY) {
-   bool found = false;
-   unsigned int i;
-
-   for (i = 0; i < MAX_PORT_RETRIES; i++) {
-   if (port <= LAST_RESERVED_PORT)
-   port = LAST_RESERVED_PORT + 1;
-
-   new_addr.svm_port = port++;
-
-   if (!__vsock_find_unbound_socket(_addr)) {
-   found = true;
-   break;
-   }
-   }
-
-   if (!found)
-   return -EADDRNOTAVAIL;
-   } else {
-   /* If port is in reserved range, ensure caller
-* has necessary privileges.
-*/
-   if (addr->svm_port <= LAST_RESERVED_PORT &&
-   !capable(CAP_NET_BIND_SERVICE)) {
-   return -EACCES;
-   }
-
-   if (__vsock_find_unbound_socket(_addr))
-   return -EADDRINUSE;
-   }
-
-   vsock_addr_init(>local_addr, new_addr.svm_cid, new_addr.svm_port);
-
-   return 0;
-}
-EXPORT_SYMBOL_GPL(vsock_bind_dgram_generic);
-
 static int __vsock_bind_dgram(struct vsock_sock *vsk,
  struct sockaddr_vm *addr)
 {
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] Revert "VSOCK: Introduce virtio-vsock.ko"

2015-12-08 Thread Stefan Hajnoczi
This reverts commit 32e61b06b6946ba137723c5b1de2a1fdb2e0e0a5.

Keep virtio-vsock out-of-tree until the device specification is
finalized.

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 net/vmw_vsock/virtio_transport.c | 466 ---
 1 file changed, 466 deletions(-)
 delete mode 100644 net/vmw_vsock/virtio_transport.c

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
deleted file mode 100644
index df65dca..000
--- a/net/vmw_vsock/virtio_transport.c
+++ /dev/null
@@ -1,466 +0,0 @@
-/*
- * virtio transport for vsock
- *
- * Copyright (C) 2013-2015 Red Hat, Inc.
- * Author: Asias He <as...@redhat.com>
- *     Stefan Hajnoczi <stefa...@redhat.com>
- *
- * Some of the code is take from Gerd Hoffmann <kra...@redhat.com>'s
- * early virtio-vsock proof-of-concept bits.
- *
- * This work is licensed under the terms of the GNU GPL, version 2.
- */
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-static struct workqueue_struct *virtio_vsock_workqueue;
-static struct virtio_vsock *the_virtio_vsock;
-static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
-static void virtio_vsock_rx_fill(struct virtio_vsock *vsock);
-
-struct virtio_vsock {
-   /* Virtio device */
-   struct virtio_device *vdev;
-   /* Virtio virtqueue */
-   struct virtqueue *vqs[VSOCK_VQ_MAX];
-   /* Wait queue for send pkt */
-   wait_queue_head_t queue_wait;
-   /* Work item to send pkt */
-   struct work_struct tx_work;
-   /* Work item to recv pkt */
-   struct work_struct rx_work;
-   /* Mutex to protect send pkt*/
-   struct mutex tx_lock;
-   /* Mutex to protect recv pkt*/
-   struct mutex rx_lock;
-   /* Number of recv buffers */
-   int rx_buf_nr;
-   /* Number of max recv buffers */
-   int rx_buf_max_nr;
-   /* Used for global tx buf limitation */
-   u32 total_tx_buf;
-   /* Guest context id, just like guest ip address */
-   u32 guest_cid;
-};
-
-static struct virtio_vsock *virtio_vsock_get(void)
-{
-   return the_virtio_vsock;
-}
-
-static u32 virtio_transport_get_local_cid(void)
-{
-   struct virtio_vsock *vsock = virtio_vsock_get();
-
-   return vsock->guest_cid;
-}
-
-static int
-virtio_transport_send_pkt(struct vsock_sock *vsk,
- struct virtio_vsock_pkt_info *info)
-{
-   u32 src_cid, src_port, dst_cid, dst_port;
-   int ret, in_sg = 0, out_sg = 0;
-   struct virtio_transport *trans;
-   struct virtio_vsock_pkt *pkt;
-   struct virtio_vsock *vsock;
-   struct scatterlist hdr, buf, *sgs[2];
-   struct virtqueue *vq;
-   u32 pkt_len = info->pkt_len;
-   DEFINE_WAIT(wait);
-
-   vsock = virtio_vsock_get();
-   if (!vsock)
-   return -ENODEV;
-
-   src_cid = virtio_transport_get_local_cid();
-   src_port = vsk->local_addr.svm_port;
-   if (!info->remote_cid) {
-   dst_cid = vsk->remote_addr.svm_cid;
-   dst_port = vsk->remote_addr.svm_port;
-   } else {
-   dst_cid = info->remote_cid;
-   dst_port = info->remote_port;
-   }
-
-   trans = vsk->trans;
-   vq = vsock->vqs[VSOCK_VQ_TX];
-
-   if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
-   pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
-   pkt_len = virtio_transport_get_credit(trans, pkt_len);
-   /* Do not send zero length OP_RW pkt*/
-   if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
-   return pkt_len;
-
-   /* Respect global tx buf limitation */
-   mutex_lock(>tx_lock);
-   while (pkt_len + vsock->total_tx_buf > VIRTIO_VSOCK_MAX_TX_BUF_SIZE) {
-   prepare_to_wait_exclusive(>queue_wait, ,
- TASK_UNINTERRUPTIBLE);
-   mutex_unlock(>tx_lock);
-   schedule();
-   mutex_lock(>tx_lock);
-   finish_wait(>queue_wait, );
-   }
-   vsock->total_tx_buf += pkt_len;
-   mutex_unlock(>tx_lock);
-
-   pkt = virtio_transport_alloc_pkt(vsk, info, pkt_len,
-src_cid, src_port,
-dst_cid, dst_port);
-   if (!pkt) {
-   mutex_lock(>tx_lock);
-   vsock->total_tx_buf -= pkt_len;
-   mutex_unlock(>tx_lock);
-   virtio_transport_put_credit(trans, pkt_len);
-   return -ENOMEM;
-   }
-
-   pr_debug("%s:info->pkt_len= %d\n", __func__, info->pkt_len);
-
-   /* Will be released in virtio_transport_send_pkt_work */
-   sock_hold(>vsk->sk);
-   virtio_transport_inc_tx_pkt(pkt);
-
-   /* Put pkt in the virtqueue */
-   sg_init_one(, >

[PATCH 1/6] Revert "VSOCK: fix returnvar.cocci warnings"

2015-12-08 Thread Stefan Hajnoczi
This reverts commit 0d76d6e8b2507983a2cae4c09880798079007421.

Keep virtio-vsock out-of-tree until the virtio-vsock device
specification is finalized.

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 drivers/vhost/vsock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 64bcb10..65b1cf8 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -56,7 +56,8 @@ struct vhost_vsock {
 
 static u32 vhost_transport_get_local_cid(void)
 {
-   return VHOST_VSOCK_DEFAULT_HOST_CID;
+   u32 cid = VHOST_VSOCK_DEFAULT_HOST_CID;
+   return cid;
 }
 
 static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] VSOCK: revert virtio-vsock until device spec is finalized

2015-12-08 Thread Stefan Hajnoczi
The virtio-vsock device specification is not finalized yet.  Michael Tsirkin
voiced concerned about merging this code when the hardware interface (and
possibly the userspace interface) could still change.

Please revert for now.

I am working to finalize the virtio-vsock device specification and at that
point the interfaces will be stable.

Stefan Hajnoczi (6):
  Revert "VSOCK: fix returnvar.cocci warnings"
  Revert "VSOCK: Add Makefile and Kconfig"
  Revert "VSOCK: Introduce vhost-vsock.ko"
  Revert "VSOCK: Introduce virtio-vsock.ko"
  Revert "VSOCK: Introduce virtio-vsock-common.ko"
  Revert "VSOCK: Introduce vsock_find_unbound_socket and
vsock_bind_dgram_generic"

 drivers/vhost/Kconfig   |4 -
 drivers/vhost/Kconfig.vsock |7 -
 drivers/vhost/Makefile  |4 -
 drivers/vhost/vsock.c   |  630 ---
 drivers/vhost/vsock.h   |4 -
 include/linux/virtio_vsock.h|  209 -
 include/net/af_vsock.h  |2 -
 include/uapi/linux/virtio_ids.h |1 -
 include/uapi/linux/virtio_vsock.h   |   89 ---
 net/vmw_vsock/Kconfig   |   18 -
 net/vmw_vsock/Makefile  |2 -
 net/vmw_vsock/af_vsock.c|   70 --
 net/vmw_vsock/virtio_transport.c|  466 ---
 net/vmw_vsock/virtio_transport_common.c | 1272 ---
 14 files changed, 2778 deletions(-)
 delete mode 100644 drivers/vhost/Kconfig.vsock
 delete mode 100644 drivers/vhost/vsock.c
 delete mode 100644 drivers/vhost/vsock.h
 delete mode 100644 include/linux/virtio_vsock.h
 delete mode 100644 include/uapi/linux/virtio_vsock.h
 delete mode 100644 net/vmw_vsock/virtio_transport.c
 delete mode 100644 net/vmw_vsock/virtio_transport_common.c

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Add virtio transport for AF_VSOCK

2015-12-08 Thread Stefan Hajnoczi
On Fri, Dec 04, 2015 at 09:45:04AM +0200, Michael S. Tsirkin wrote:
> On Wed, Dec 02, 2015 at 02:43:58PM +0800, Stefan Hajnoczi wrote:
> > 1. The 3-way handshake isn't necessary over a reliable transport 
> > (virtqueue).
> >Spoofing packets is also impossible so the security aspects of the 3-way
> >handshake (including syn cookie) add nothing.  The next version will 
> > have a
> >single operation to establish a connection.
> 
> It's hard to discuss without seeing the details, but we do need to
> slow down guests that are flooding host with socket creation requests.
> The handshake is a simple way for hypervisor to defer
> such requests until it has resources without
> breaking things.

I'll send an updated virtio-vsock device specification so we can discuss
it.

VMCI simply uses sk->sk_max_ack_backlog in
net/vmw_vsock/vmci_transport.c.  If backlog (from listen(2)) is maxed
out then the connection is refused.

The same would work for virtio-vsock and there is no advantage to the
3-way handshake.

> > 2. Credit-based flow control doesn't work for SOCK_DGRAM since multiple 
> > clients
> >can transmit to the same listen socket.  There is no way for the clients 
> > to
> >coordinate buffer space with each other fairly.  The next version will 
> > drop
> >credit-based flow control for SOCK_DGRAM and only rely on best-effort
> >delivery.  SOCK_STREAM still has guaranteed delivery.
> 
> I suspect in the end we will need a measure of fairness even
> if you drop packets. And recovering from packet loss is
> hard enough that not many applications do it correctly.
> I suggest disabling SOCK_DGRAM for now.

I'm not aware of a SOCK_DGRAM user at this time.  Will disable it for
now.


signature.asc
Description: PGP signature


Re: [PATCH 0/6] VSOCK: revert virtio-vsock until device spec is finalized

2015-12-08 Thread Stefan Hajnoczi
On Tue, Dec 08, 2015 at 11:26:55AM -0500, David Miller wrote:
> From: Stefan Hajnoczi <stefa...@redhat.com>
> Date: Tue,  8 Dec 2015 19:57:30 +0800
> 
> > Please revert for now.
> 
> Please don't revert it piece by piece like this.
> 
> Instead, send me one big revert commit that undoes the whole
> thing.  There is even a merge commit that you can use to
> create that revert cleanly.

Okay.

Stefan


signature.asc
Description: PGP signature


[PATCH v2] Revert "Merge branch 'vsock-virtio'"

2015-12-08 Thread Stefan Hajnoczi
This reverts commit 0d76d6e8b2507983a2cae4c09880798079007421 and merge
commit c402293bd76fbc93e52ef8c0947ab81eea3ae019, reversing changes made
to c89359a42e2a49656451569c382eed63e781153c.

The virtio-vsock device specification is not finalized yet.  Michael
Tsirkin voiced concerned about merging this code when the hardware
interface (and possibly the userspace interface) could still change.

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v2:
 * Revert merge commit and coccinelle fixup in a single patch

 drivers/vhost/Kconfig   |4 -
 drivers/vhost/Kconfig.vsock |7 -
 drivers/vhost/Makefile  |4 -
 drivers/vhost/vsock.c   |  630 ---
 drivers/vhost/vsock.h   |4 -
 include/linux/virtio_vsock.h|  209 -
 include/net/af_vsock.h  |2 -
 include/uapi/linux/virtio_ids.h |1 -
 include/uapi/linux/virtio_vsock.h   |   89 ---
 net/vmw_vsock/Kconfig   |   18 -
 net/vmw_vsock/Makefile  |2 -
 net/vmw_vsock/af_vsock.c|   70 --
 net/vmw_vsock/virtio_transport.c|  466 ---
 net/vmw_vsock/virtio_transport_common.c | 1272 ---
 14 files changed, 2778 deletions(-)
 delete mode 100644 drivers/vhost/Kconfig.vsock
 delete mode 100644 drivers/vhost/vsock.c
 delete mode 100644 drivers/vhost/vsock.h
 delete mode 100644 include/linux/virtio_vsock.h
 delete mode 100644 include/uapi/linux/virtio_vsock.h
 delete mode 100644 net/vmw_vsock/virtio_transport.c
 delete mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 81449bf..533eaf0 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -47,7 +47,3 @@ config VHOST_CROSS_ENDIAN_LEGACY
  adds some overhead, it is disabled by default.
 
  If unsure, say "N".
-
-if STAGING
-source "drivers/vhost/Kconfig.vsock"
-endif
diff --git a/drivers/vhost/Kconfig.vsock b/drivers/vhost/Kconfig.vsock
deleted file mode 100644
index 3491865..000
--- a/drivers/vhost/Kconfig.vsock
+++ /dev/null
@@ -1,7 +0,0 @@
-config VHOST_VSOCK
-   tristate "vhost virtio-vsock driver"
-   depends on VSOCKETS && EVENTFD
-   select VIRTIO_VSOCKETS_COMMON
-   default n
-   ---help---
-   Say M here to enable the vhost-vsock for virtio-vsock guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6b012b9..e0441c3 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,9 +4,5 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
-obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
-vhost_vsock-y := vsock.o
-
 obj-$(CONFIG_VHOST_RING) += vringh.o
-
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
deleted file mode 100644
index 64bcb10..000
--- a/drivers/vhost/vsock.c
+++ /dev/null
@@ -1,630 +0,0 @@
-/*
- * vhost transport for vsock
- *
- * Copyright (C) 2013-2015 Red Hat, Inc.
- * Author: Asias He <as...@redhat.com>
- * Stefan Hajnoczi <stefa...@redhat.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2.
- */
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include "vhost.h"
-#include "vsock.h"
-
-#define VHOST_VSOCK_DEFAULT_HOST_CID   2
-
-static int vhost_transport_socket_init(struct vsock_sock *vsk,
-  struct vsock_sock *psk);
-
-enum {
-   VHOST_VSOCK_FEATURES = VHOST_FEATURES,
-};
-
-/* Used to track all the vhost_vsock instances on the system. */
-static LIST_HEAD(vhost_vsock_list);
-static DEFINE_MUTEX(vhost_vsock_mutex);
-
-struct vhost_vsock_virtqueue {
-   struct vhost_virtqueue vq;
-};
-
-struct vhost_vsock {
-   /* Vhost device */
-   struct vhost_dev dev;
-   /* Vhost vsock virtqueue*/
-   struct vhost_vsock_virtqueue vqs[VSOCK_VQ_MAX];
-   /* Link to global vhost_vsock_list*/
-   struct list_head list;
-   /* Head for pkt from host to guest */
-   struct list_head send_pkt_list;
-   /* Work item to send pkt */
-   struct vhost_work send_pkt_work;
-   /* Wait queue for send pkt */
-   wait_queue_head_t queue_wait;
-   /* Used for global tx buf limitation */
-   u32 total_tx_buf;
-   /* Guest contex id this vhost_vsock instance handles */
-   u32 guest_cid;
-};
-
-static u32 vhost_transport_get_local_cid(void)
-{
-   return VHOST_VSOCK_DEFAULT_HOST_CID;
-}
-
-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
-{
-   struct vhost_vsock *vsock;
-
-   mutex_lock(_vsock_mutex);
-   list_for_each_entry(vsock, _vsock_list, list) {
-   if (vsock->guest_cid == guest_cid) {
-   mutex_unlock(_vsock_mutex);
-   return vsock;
-

Re: [PATCH] VSOCK: fix returnvar.cocci warnings

2015-12-06 Thread Stefan Hajnoczi
On Sun, Dec 06, 2015 at 06:56:23AM +0100, Julia Lawall wrote:
> Remove unneeded variable used to store return value.
> 
> Generated by: scripts/coccinelle/misc/returnvar.cocci
> 
> CC: Asias He <as...@redhat.com>
> Signed-off-by: Fengguang Wu <fengguang...@intel.com>
> Signed-off-by: Julia Lawall <julia.law...@lip6.fr>
> 
> ---
> 
>  vsock.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -56,8 +56,7 @@ struct vhost_vsock {
>  
>  static u32 vhost_transport_get_local_cid(void)
>  {
> - u32 cid = VHOST_VSOCK_DEFAULT_HOST_CID;
> - return cid;
> + return VHOST_VSOCK_DEFAULT_HOST_CID;
>  }
>  
>  static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>


signature.asc
Description: PGP signature


[PATCH] VSOCK: mark virtio_transport.ko experimental

2015-12-03 Thread Stefan Hajnoczi
Be explicit that the virtio_transport.ko code implements a draft virtio
specification that is still subject to change.

Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
If you'd rather wait until the device specification has been finalized, feel
free to revert the virtio-vsock code for now.  Apologies for not mentioning the
status in the Kconfig earlier.

 net/vmw_vsock/Kconfig | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 74e0bc8..d8be850 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -28,12 +28,17 @@ config VMWARE_VMCI_VSOCKETS
  will be called vmw_vsock_vmci_transport. If unsure, say N.
 
 config VIRTIO_VSOCKETS
-   tristate "virtio transport for Virtual Sockets"
+   tristate "virtio transport for Virtual Sockets (Experimental)"
depends on VSOCKETS && VIRTIO
select VIRTIO_VSOCKETS_COMMON
+   default n
help
  This module implements a virtio transport for Virtual Sockets.
 
+ This feature is based on a draft of the virtio-vsock device
+ specification that is still subject to change.  It can be used
+ to begin developing applications that use Virtual Sockets.
+
  Enable this transport if your Virtual Machine runs on Qemu/KVM.
 
  To compile this driver as a module, choose M here: the module
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/5] Add virtio transport for AF_VSOCK

2015-12-01 Thread Stefan Hajnoczi
v2:
 * Rebased onto Linux v4.4-rc2
 * vhost: Refuse to assign reserved CIDs
 * vhost: Refuse guest CID if already in use
 * vhost: Only accept correctly addressed packets (no spoofing!)
 * vhost: Support flexible rx/tx descriptor layout
 * vhost: Add missing total_tx_buf decrement
 * virtio_transport: Fix total_tx_buf accounting
 * virtio_transport: Add virtio_transport global mutex to prevent races
 * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
 * common: Fix peer_buf_alloc inheritance on child socket

This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors.  It is currently only implemented for VMware's VMCI transport.

This series implements the proposed virtio-vsock device specification from
here:
http://comments.gmane.org/gmane.comp.emulators.virtio.devel/855

Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
picked up the series again.

The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock

Why virtio-vsock?
-
Guest<->host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.

virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM and SOCK_DGRAM semantics.  Applications on the host can easily
connect to guest agents because the sockets API allows multiple connections to
a listen socket (unlike virtio-serial).  This simplifies the guest<->host
communication and eliminates the need for extra processes on the host to
arbitrate virtio-serial ports.

Overview

This series adds 3 pieces:

1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko

2. virtio_transport.ko - guest driver

3. drivers/vhost/vsock.ko - host driver

Howto
-
The following kernel options are needed:
  CONFIG_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS_COMMON=y
  CONFIG_VHOST_VSOCK=m

Launch QEMU as follows:
  # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3

Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
is 2 and the guest is automatically assigned a CID (use VMADDR_CID_ANY (-1) to
bind to it).

Status
--
There are a few design changes I'd like to make to the virtio-vsock device:

1. The 3-way handshake isn't necessary over a reliable transport (virtqueue).
   Spoofing packets is also impossible so the security aspects of the 3-way
   handshake (including syn cookie) add nothing.  The next version will have a
   single operation to establish a connection.

2. Credit-based flow control doesn't work for SOCK_DGRAM since multiple clients
   can transmit to the same listen socket.  There is no way for the clients to
   coordinate buffer space with each other fairly.  The next version will drop
   credit-based flow control for SOCK_DGRAM and only rely on best-effort
   delivery.  SOCK_STREAM still has guaranteed delivery.

3. In the next version only the host will be able to establish connections
   (i.e. to connect to a guest agent).  This is for security reasons since
   there is currently no ability to provide host services only to certain
   guests.  This also matches how AF_VSOCK works on modern VMware hypervisors.

Asias He (5):
  VSOCK: Introduce vsock_find_unbound_socket and
vsock_bind_dgram_generic
  VSOCK: Introduce virtio-vsock-common.ko
  VSOCK: Introduce virtio-vsock.ko
  VSOCK: Introduce vhost-vsock.ko
  VSOCK: Add Makefile and Kconfig

 drivers/vhost/Kconfig   |4 +
 drivers/vhost/Kconfig.vsock |7 +
 drivers/vhost/Makefile  |4 +
 drivers/vhost/vsock.c   |  631 +++
 drivers/vhost/vsock.h   |4 +
 include/linux/virtio_vsock.h|  209 +
 include/net/af_vsock.h  |2 +
 include/uapi/linux/virtio_ids.h |1 +
 include/uapi/linux/virtio_vsock.h   |   89 +++
 net/vmw_vsock/Kconfig   |   18 +
 net/vmw_vsock/Makefile  |2 +
 net/vmw_vsock/af_vsock.c|   70 ++
 net/vmw_vsock/virtio_transport.c|  466 +++
 net/vmw_vsock/virtio_transport_common.c | 1272 +++
 14 files changed, 2779 insertions(+)
 create mode 100644 drivers/vhost/Kconfig.vsock
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport.c
 create mode 100644 

[PATCH v2 3/5] VSOCK: Introduce virtio-vsock.ko

2015-12-01 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

VM sockets virtio transport implementation. This module runs in guest
kernel.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v2:
 * Fix total_tx_buf accounting
 * Add virtio_transport global mutex to prevent races
---
 net/vmw_vsock/virtio_transport.c | 466 +++
 1 file changed, 466 insertions(+)
 create mode 100644 net/vmw_vsock/virtio_transport.c

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
new file mode 100644
index 000..df65dca
--- /dev/null
+++ b/net/vmw_vsock/virtio_transport.c
@@ -0,0 +1,466 @@
+/*
+ * virtio transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He <as...@redhat.com>
+ * Stefan Hajnoczi <stefa...@redhat.com>
+ *
+ * Some of the code is take from Gerd Hoffmann <kra...@redhat.com>'s
+ * early virtio-vsock proof-of-concept bits.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct workqueue_struct *virtio_vsock_workqueue;
+static struct virtio_vsock *the_virtio_vsock;
+static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock);
+
+struct virtio_vsock {
+   /* Virtio device */
+   struct virtio_device *vdev;
+   /* Virtio virtqueue */
+   struct virtqueue *vqs[VSOCK_VQ_MAX];
+   /* Wait queue for send pkt */
+   wait_queue_head_t queue_wait;
+   /* Work item to send pkt */
+   struct work_struct tx_work;
+   /* Work item to recv pkt */
+   struct work_struct rx_work;
+   /* Mutex to protect send pkt*/
+   struct mutex tx_lock;
+   /* Mutex to protect recv pkt*/
+   struct mutex rx_lock;
+   /* Number of recv buffers */
+   int rx_buf_nr;
+   /* Number of max recv buffers */
+   int rx_buf_max_nr;
+   /* Used for global tx buf limitation */
+   u32 total_tx_buf;
+   /* Guest context id, just like guest ip address */
+   u32 guest_cid;
+};
+
+static struct virtio_vsock *virtio_vsock_get(void)
+{
+   return the_virtio_vsock;
+}
+
+static u32 virtio_transport_get_local_cid(void)
+{
+   struct virtio_vsock *vsock = virtio_vsock_get();
+
+   return vsock->guest_cid;
+}
+
+static int
+virtio_transport_send_pkt(struct vsock_sock *vsk,
+ struct virtio_vsock_pkt_info *info)
+{
+   u32 src_cid, src_port, dst_cid, dst_port;
+   int ret, in_sg = 0, out_sg = 0;
+   struct virtio_transport *trans;
+   struct virtio_vsock_pkt *pkt;
+   struct virtio_vsock *vsock;
+   struct scatterlist hdr, buf, *sgs[2];
+   struct virtqueue *vq;
+   u32 pkt_len = info->pkt_len;
+   DEFINE_WAIT(wait);
+
+   vsock = virtio_vsock_get();
+   if (!vsock)
+   return -ENODEV;
+
+   src_cid = virtio_transport_get_local_cid();
+   src_port = vsk->local_addr.svm_port;
+   if (!info->remote_cid) {
+   dst_cid = vsk->remote_addr.svm_cid;
+   dst_port = vsk->remote_addr.svm_port;
+   } else {
+   dst_cid = info->remote_cid;
+   dst_port = info->remote_port;
+   }
+
+   trans = vsk->trans;
+   vq = vsock->vqs[VSOCK_VQ_TX];
+
+   if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
+   pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+   pkt_len = virtio_transport_get_credit(trans, pkt_len);
+   /* Do not send zero length OP_RW pkt*/
+   if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
+   return pkt_len;
+
+   /* Respect global tx buf limitation */
+   mutex_lock(>tx_lock);
+   while (pkt_len + vsock->total_tx_buf > VIRTIO_VSOCK_MAX_TX_BUF_SIZE) {
+   prepare_to_wait_exclusive(>queue_wait, ,
+ TASK_UNINTERRUPTIBLE);
+   mutex_unlock(>tx_lock);
+   schedule();
+   mutex_lock(>tx_lock);
+   finish_wait(>queue_wait, );
+   }
+   vsock->total_tx_buf += pkt_len;
+   mutex_unlock(>tx_lock);
+
+   pkt = virtio_transport_alloc_pkt(vsk, info, pkt_len,
+src_cid, src_port,
+dst_cid, dst_port);
+   if (!pkt) {
+   mutex_lock(>tx_lock);
+   vsock->total_tx_buf -= pkt_len;
+   mutex_unlock(>tx_lock);
+   virtio_transport_put_credit(trans, pkt_len);
+   return -ENOMEM;
+   }
+
+   pr_debug("%s:info->pkt_len= %d\n", __func__, info->pkt_len);
+
+   /* Will be released in virtio_transport_send_pkt_work */
+   sock_hold(>v

[PATCH v2 4/5] VSOCK: Introduce vhost-vsock.ko

2015-12-01 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

VM sockets vhost transport implementation. This module runs in host
kernel.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v2:
 * Add missing total_tx_buf decrement
 * Support flexible rx/tx descriptor layout
 * Refuse to assign reserved CIDs
 * Refuse guest CID if already in use
 * Only accept correctly addressed packets
---
 drivers/vhost/vsock.c | 631 ++
 drivers/vhost/vsock.h |   4 +
 2 files changed, 635 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
new file mode 100644
index 000..65b1cf8
--- /dev/null
+++ b/drivers/vhost/vsock.c
@@ -0,0 +1,631 @@
+/*
+ * vhost transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He <as...@redhat.com>
+ * Stefan Hajnoczi <stefa...@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "vhost.h"
+#include "vsock.h"
+
+#define VHOST_VSOCK_DEFAULT_HOST_CID   2
+
+static int vhost_transport_socket_init(struct vsock_sock *vsk,
+  struct vsock_sock *psk);
+
+enum {
+   VHOST_VSOCK_FEATURES = VHOST_FEATURES,
+};
+
+/* Used to track all the vhost_vsock instances on the system. */
+static LIST_HEAD(vhost_vsock_list);
+static DEFINE_MUTEX(vhost_vsock_mutex);
+
+struct vhost_vsock_virtqueue {
+   struct vhost_virtqueue vq;
+};
+
+struct vhost_vsock {
+   /* Vhost device */
+   struct vhost_dev dev;
+   /* Vhost vsock virtqueue*/
+   struct vhost_vsock_virtqueue vqs[VSOCK_VQ_MAX];
+   /* Link to global vhost_vsock_list*/
+   struct list_head list;
+   /* Head for pkt from host to guest */
+   struct list_head send_pkt_list;
+   /* Work item to send pkt */
+   struct vhost_work send_pkt_work;
+   /* Wait queue for send pkt */
+   wait_queue_head_t queue_wait;
+   /* Used for global tx buf limitation */
+   u32 total_tx_buf;
+   /* Guest contex id this vhost_vsock instance handles */
+   u32 guest_cid;
+};
+
+static u32 vhost_transport_get_local_cid(void)
+{
+   u32 cid = VHOST_VSOCK_DEFAULT_HOST_CID;
+   return cid;
+}
+
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+   struct vhost_vsock *vsock;
+
+   mutex_lock(_vsock_mutex);
+   list_for_each_entry(vsock, _vsock_list, list) {
+   if (vsock->guest_cid == guest_cid) {
+   mutex_unlock(_vsock_mutex);
+   return vsock;
+   }
+   }
+   mutex_unlock(_vsock_mutex);
+
+   return NULL;
+}
+
+static void
+vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
+   struct vhost_virtqueue *vq)
+{
+   bool added = false;
+
+   mutex_lock(>mutex);
+   vhost_disable_notify(>dev, vq);
+   for (;;) {
+   struct virtio_vsock_pkt *pkt;
+   struct iov_iter iov_iter;
+   unsigned out, in;
+   struct sock *sk;
+   size_t nbytes;
+   size_t len;
+   int head;
+
+   if (list_empty(>send_pkt_list)) {
+   vhost_enable_notify(>dev, vq);
+   break;
+   }
+
+   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+, , NULL, NULL);
+   pr_debug("%s: head = %d\n", __func__, head);
+   if (head < 0)
+   break;
+
+   if (head == vq->num) {
+   if (unlikely(vhost_enable_notify(>dev, vq))) {
+   vhost_disable_notify(>dev, vq);
+   continue;
+   }
+   break;
+   }
+
+   pkt = list_first_entry(>send_pkt_list,
+  struct virtio_vsock_pkt, list);
+   list_del_init(>list);
+
+   if (out) {
+   virtio_transport_free_pkt(pkt);
+   vq_err(vq, "Expected 0 output buffers, got %u\n", out);
+   break;
+   }
+
+   len = iov_length(>iov[out], in);
+   iov_iter_init(_iter, READ, >iov[out], in, len);
+
+   nbytes = copy_to_iter(>hdr, sizeof(pkt->hdr), _iter);
+   if (nbytes != sizeof(pkt->hdr)) {
+   virtio_transport_free_pkt(pkt);
+   vq_err(vq, "Faulted on copying pkt hdr\n");
+   break;
+   }
+
+   nbytes = co

[PATCH v2 2/5] VSOCK: Introduce virtio-vsock-common.ko

2015-12-01 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

This module contains the common code and header files for the following
virtio-vsock and virtio-vhost kernel modules.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
v2:
 * Fix peer_buf_alloc inheritance on child socket
 * Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
---
 include/linux/virtio_vsock.h|  209 +
 include/uapi/linux/virtio_ids.h |1 +
 include/uapi/linux/virtio_vsock.h   |   89 +++
 net/vmw_vsock/virtio_transport_common.c | 1272 +++
 4 files changed, 1571 insertions(+)
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
new file mode 100644
index 000..a5f3ecc
--- /dev/null
+++ b/include/linux/virtio_vsock.h
@@ -0,0 +1,209 @@
+/*
+ * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
+ * anyone can use the definitions to implement compatible drivers/servers:
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Copyright (C) Red Hat, Inc., 2013-2015
+ * Copyright (C) Asias He <as...@redhat.com>, 2013
+ * Copyright (C) Stefan Hajnoczi <stefa...@redhat.com>, 2015
+ */
+
+#ifndef _LINUX_VIRTIO_VSOCK_H
+#define _LINUX_VIRTIO_VSOCK_H
+
+#include 
+#include 
+#include 
+
+#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE  128
+#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE   (1024 * 4)
+#define VIRTIO_VSOCK_MAX_BUF_SIZE  0xUL
+#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE  (1024 * 64)
+#define VIRTIO_VSOCK_MAX_TX_BUF_SIZE   (1024 * 1024 * 16)
+#define VIRTIO_VSOCK_MAX_DGRAM_SIZE(1024 * 64)
+
+struct vsock_transport_recv_notify_data;
+struct vsock_transport_send_notify_data;
+struct sockaddr_vm;
+struct vsock_sock;
+
+enum {
+   VSOCK_VQ_CTRL   = 0,
+   VSOCK_VQ_RX = 1, /* for host to guest data */
+   VSOCK_VQ_TX = 2, /* for guest to host data */
+   VSOCK_VQ_MAX= 3,
+};
+
+/* virtio transport socket state */
+struct virtio_transport {
+   struct virtio_transport_pkt_ops *ops;
+   struct vsock_sock *vsk;
+
+   u32 buf_size;
+   u32 buf_size_min;
+   u32 buf_size_max;
+
+   struct mutex tx_lock;
+   struct mutex rx_lock;
+
+   struct list_head rx_queue;
+   u32 rx_bytes;
+
+   /* Protected by trans->tx_lock */
+   u32 tx_cnt;
+   u32 buf_alloc;
+   u32 peer_fwd_cnt;
+   u32 peer_buf_alloc;
+   /* Protected by trans->rx_lock */
+   u32 fwd_cnt;
+
+   /* Protected by sk_lock */
+   u16 dgram_id;
+   struct list_head incomplete_dgrams; /* dgram fragments */
+};
+
+struct virtio_vsock_pkt {
+   struct virtio_vsock_hdr hdr;
+   struct virtio_transport *trans;
+   struct work_struct work;
+   struct list_head list;
+   void *buf;
+   u32 len;
+   u32 off;
+};
+
+struct virtio_vsock_pkt_info {
+   u32 remote_cid, remote_port;
+   struct msghdr *msg;
+   u32 pkt_l

[PATCH v2 5/5] VSOCK: Add Makefile and Kconfig

2015-12-01 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 drivers/vhost/Kconfig   |  4 
 drivers/vhost/Kconfig.vsock |  7 +++
 drivers/vhost/Makefile  |  4 
 net/vmw_vsock/Kconfig   | 18 ++
 net/vmw_vsock/Makefile  |  2 ++
 5 files changed, 35 insertions(+)
 create mode 100644 drivers/vhost/Kconfig.vsock

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..81449bf 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -47,3 +47,7 @@ config VHOST_CROSS_ENDIAN_LEGACY
  adds some overhead, it is disabled by default.
 
  If unsure, say "N".
+
+if STAGING
+source "drivers/vhost/Kconfig.vsock"
+endif
diff --git a/drivers/vhost/Kconfig.vsock b/drivers/vhost/Kconfig.vsock
new file mode 100644
index 000..3491865
--- /dev/null
+++ b/drivers/vhost/Kconfig.vsock
@@ -0,0 +1,7 @@
+config VHOST_VSOCK
+   tristate "vhost virtio-vsock driver"
+   depends on VSOCKETS && EVENTFD
+   select VIRTIO_VSOCKETS_COMMON
+   default n
+   ---help---
+   Say M here to enable the vhost-vsock for virtio-vsock guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index e0441c3..6b012b9 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,5 +4,9 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
+obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
+vhost_vsock-y := vsock.o
+
 obj-$(CONFIG_VHOST_RING) += vringh.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 14810ab..74e0bc8 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -26,3 +26,21 @@ config VMWARE_VMCI_VSOCKETS
 
  To compile this driver as a module, choose M here: the module
  will be called vmw_vsock_vmci_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS
+   tristate "virtio transport for Virtual Sockets"
+   depends on VSOCKETS && VIRTIO
+   select VIRTIO_VSOCKETS_COMMON
+   help
+ This module implements a virtio transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine runs on Qemu/KVM.
+
+ To compile this driver as a module, choose M here: the module
+ will be called virtio_vsock_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS_COMMON
+   tristate
+   ---help---
+ This option is selected by any driver which needs to access
+ the virtio_vsock.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 2ce52d7..cf4c294 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -1,5 +1,7 @@
 obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS) += virtio_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += virtio_transport_common.o
 
 vsock-y += af_vsock.o vsock_addr.o
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] VSOCK: Introduce vsock_find_unbound_socket and vsock_bind_dgram_generic

2015-12-01 Thread Stefan Hajnoczi
From: Asias He <as...@redhat.com>

Signed-off-by: Asias He <as...@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
---
 include/net/af_vsock.h   |  2 ++
 net/vmw_vsock/af_vsock.c | 70 
 2 files changed, 72 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index e9eb2d6..a0c8fa2 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -175,8 +175,10 @@ void vsock_insert_connected(struct vsock_sock *vsk);
 void vsock_remove_bound(struct vsock_sock *vsk);
 void vsock_remove_connected(struct vsock_sock *vsk);
 struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
+struct sock *vsock_find_unbound_socket(struct sockaddr_vm *addr);
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
+int vsock_bind_dgram_generic(struct vsock_sock *vsk, struct sockaddr_vm *addr);
 
 #endif /* __AF_VSOCK_H__ */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 7fd1220..77247a2 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -223,6 +223,17 @@ static struct sock *__vsock_find_bound_socket(struct 
sockaddr_vm *addr)
return NULL;
 }
 
+static struct sock *__vsock_find_unbound_socket(struct sockaddr_vm *addr)
+{
+   struct vsock_sock *vsk;
+
+   list_for_each_entry(vsk, vsock_unbound_sockets, bound_table)
+   if (addr->svm_port == vsk->local_addr.svm_port)
+   return sk_vsock(vsk);
+
+   return NULL;
+}
+
 static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
  struct sockaddr_vm *dst)
 {
@@ -298,6 +309,21 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm 
*addr)
 }
 EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
 
+struct sock *vsock_find_unbound_socket(struct sockaddr_vm *addr)
+{
+   struct sock *sk;
+
+   spin_lock_bh(_table_lock);
+   sk = __vsock_find_unbound_socket(addr);
+   if (sk)
+   sock_hold(sk);
+
+   spin_unlock_bh(_table_lock);
+
+   return sk;
+}
+EXPORT_SYMBOL_GPL(vsock_find_unbound_socket);
+
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst)
 {
@@ -532,6 +558,50 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
return 0;
 }
 
+int vsock_bind_dgram_generic(struct vsock_sock *vsk, struct sockaddr_vm *addr)
+{
+   static u32 port = LAST_RESERVED_PORT + 1;
+   struct sockaddr_vm new_addr;
+
+   vsock_addr_init(_addr, addr->svm_cid, addr->svm_port);
+
+   if (addr->svm_port == VMADDR_PORT_ANY) {
+   bool found = false;
+   unsigned int i;
+
+   for (i = 0; i < MAX_PORT_RETRIES; i++) {
+   if (port <= LAST_RESERVED_PORT)
+   port = LAST_RESERVED_PORT + 1;
+
+   new_addr.svm_port = port++;
+
+   if (!__vsock_find_unbound_socket(_addr)) {
+   found = true;
+   break;
+   }
+   }
+
+   if (!found)
+   return -EADDRNOTAVAIL;
+   } else {
+   /* If port is in reserved range, ensure caller
+* has necessary privileges.
+*/
+   if (addr->svm_port <= LAST_RESERVED_PORT &&
+   !capable(CAP_NET_BIND_SERVICE)) {
+   return -EACCES;
+   }
+
+   if (__vsock_find_unbound_socket(_addr))
+   return -EADDRINUSE;
+   }
+
+   vsock_addr_init(>local_addr, new_addr.svm_cid, new_addr.svm_port);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vsock_bind_dgram_generic);
+
 static int __vsock_bind_dgram(struct vsock_sock *vsk,
  struct sockaddr_vm *addr)
 {
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: best way to create a snapshot of a running vm ?

2015-12-01 Thread Stefan Hajnoczi
On Mon, Nov 30, 2015 at 04:34:14PM +0100, Lentes, Bernd wrote:
> Stefan wrote:
> 
> > 
> > Hi Bernd,
> > qemu-img cannot be used on the disk image when the VM is running.
> > Please use virsh, it communicates with the running QEMU process and
> > ensures that the snapshot is crash-consistent.
> > 
> 
> Hi Stefan,
> 
> thanks for your answer.
> 
> i read that virsh uses internally qemu-img
> (http://serverfault.com/questions/692435/qemu-img-snapshot-on-live-vm).
> Is that true ?

It's false in the general case.

While the VM is running libvirt will use the QEMU monitor to communicate
with the QEMU process instead of using qemu-img.

While the VM is shut down libvirt can use qemu-img safely.

The reason why qemu-img isn't safe is that the image file might be
written to by the running VM at the same time as qemu-img reads/writes
it.  This can corrupt image files.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v8 0/5] implement vNVDIMM

2015-11-30 Thread Stefan Hajnoczi
test or emulation, however,
> in the real word, the files passed to guest are:
> - the regular file in the filesystem with DAX enabled created on NVDIMM device
>   on host
> - the raw PMEM device on host, e,g /dev/pmem0
> Memory access on the address created by mmap on these kinds of files can
> directly reach NVDIMM device on host.
> 
> --- vConfigure data area design ---
> Each NVDIMM device has a configure data area which is used to store label
> namespace data. In order to emulating this area, we divide the file into two
> parts:
> - first parts is (0, size - 128K], which is used as PMEM
> - 128K at the end of the file, which is used as Label Data Area
> So that the label namespace data can be persistent during power lose or system
> failure.
> 
> We also support passing the whole file to guest without reserve any region for
> label data area which is achieved by "reserve-label-data" parameter - if it's
> false then QEMU will build static and readonly namespace in memory and that
> namespace contains the whole file size. The parameter is false on default.
> 
> --- _DSM method design ---
> _DSM in ACPI is used to configure NVDIMM, currently we only allow access of
> label namespace data, i.e, Get Namespace Label Size (Function Index 4),
> Get Namespace Label Data (Function Index 5) and Set Namespace Label Data
> (Function Index 6)
> 
> _DSM uses two pages to transfer data between ACPI and Qemu, the first page
> is RAM-based used to save the input info of _DSM method and Qemu reuse it
> store output info and another page is MMIO-based, ACPI write data to this
> page to transfer the control to Qemu
> 
> == Test ==
> In host
> 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
> 2) append "-object memory-backend-file,share,id=mem1,
>mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
>id=nv1" in QEMU command line
> 
> In guest, download the latest upsteam kernel (4.2 merge window) and enable
> ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
> 1) insmod drivers/nvdimm/libnvdimm.ko
> 2) insmod drivers/acpi/nfit.ko
> 3) insmod drivers/nvdimm/nd_btt.ko
> 4) insmod drivers/nvdimm/nd_pmem.ko
> You can see the whole nvdimm device used as a single namespace and /dev/pmem0
> appears. You can do whatever on /dev/pmem0 including DAX access.
> 
> Currently Linux NVDIMM driver does not support namespace operation on this
> kind of PMEM, apply below changes to support dynamical namespace:
> 
> @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc 
> *a
> continue;
> }
>  
> -   if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> +   //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> +   if (nfit_mem->memdev_pmem)
> flags |= NDD_ALIASING;
> 
> You can append another NVDIMM device in guest and do:   
> # cd /sys/bus/nd/devices/
> # cd namespace1.0/
> # echo `uuidgen` > uuid
> # echo `expr 1024 \* 1024 \* 128` > size
> then reload nd.pmem.ko
> 
> You can see /dev/pmem1 appears
> 
> Xiao Guangrong (5):
>   nvdimm: implement NVDIMM device abstract
>   acpi: support specified oem table id for build_header
>   nvdimm acpi: build ACPI NFIT table
>   nvdimm acpi: build ACPI nvdimm devices
>   nvdimm: add maintain info
> 
>  MAINTAINERS    |   7 +
>  default-configs/i386-softmmu.mak   |   2 +
>  default-configs/x86_64-softmmu.mak |   2 +
>  hw/acpi/Makefile.objs  |   1 +
>  hw/acpi/aml-build.c|  15 +-
>  hw/acpi/ich9.c |  19 ++
>  hw/acpi/memory_hotplug.c   |   5 +
>  hw/acpi/nvdimm.c   | 467 
> +
>  hw/acpi/piix4.c|   4 +
>  hw/arm/virt-acpi-build.c   |  13 +-
>  hw/i386/acpi-build.c   |  26 ++-
>  hw/mem/Makefile.objs   |   1 +
>  hw/mem/nvdimm.c|  46 
>  include/hw/acpi/aml-build.h|   3 +-
>  include/hw/acpi/ich9.h |   3 +
>  include/hw/i386/pc.h   |  12 +-
>  include/hw/mem/nvdimm.h|  41 
>  17 files changed, 645 insertions(+), 22 deletions(-)
>  create mode 100644 hw/acpi/nvdimm.c
>  create mode 100644 hw/mem/nvdimm.c
>  create mode 100644 include/hw/mem/nvdimm.h

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>


signature.asc
Description: PGP signature


Re: best way to create a snapshot of a running vm ?

2015-11-30 Thread Stefan Hajnoczi
On Mon, Nov 30, 2015 at 12:36:56AM +0100, Lentes, Bernd wrote:
> what is the best way to create a snapshot of a running vm ? qemu-img or virsh 
> ?
> I#d like to create a snapshot which is copied afterwards by other means, e.g. 
> by a network based backup software.

Hi Bernd,
qemu-img cannot be used on the disk image when the VM is running.
Please use virsh, it communicates with the running QEMU process and
ensures that the snapshot is crash-consistent.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v8 0/5] implement vNVDIMM

2015-11-23 Thread Stefan Hajnoczi
On Thu, Nov 19, 2015 at 10:39:05AM +0800, Xiao Guangrong wrote:
> On 11/19/2015 04:44 AM, Michael S. Tsirkin wrote:
> >On Wed, Nov 18, 2015 at 05:18:17PM -0200, Eduardo Habkost wrote:
> >>On Wed, Nov 18, 2015 at 09:59:34AM +0800, Xiao Guangrong wrote:
> >sorry, I'm busy with 2.5 now, and this is clearly not 2.5 material.
> 
> I still see some pull requests were send our for 2.5 merge window today and
> yesterday ...
> 
> This patchset is the simplest version we can figure out to implement basic
> functionality for vNVDIMM and only minor change is needed for other code.
> It would be nice and really appreciate if it can go to 2.5.

Here is the release schedule:
http://qemu-project.org/Planning/2.5

QEMU is in hard freeze right now.  That means only critical bug fixes
are being merged.  No new features will be merged until the QEMU 2.6
development cycle begins.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v7 00/35] implement vNVDIMM

2015-11-02 Thread Stefan Hajnoczi
I have reviewed ACPI interface:

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH v6 00/33] implement vNVDIMM

2015-10-30 Thread Stefan Hajnoczi
>  hw/mem/pc-dimm.c   |  510 +
>  hw/ppc/spapr.c |   20 +-
>  include/hw/acpi/aml-build.h|7 +
>  include/hw/acpi/ich9.h |3 +
>  include/hw/i386/pc.h   |   12 +-
>  include/hw/mem/dimm.h  |   95 ++
>  include/hw/mem/nvdimm.h|  133 +++
>  include/hw/mem/pc-dimm.h   |  104 +-
>  include/hw/ppc/spapr.h |2 +-
>  include/qemu/osdep.h   |1 +
>  numa.c |4 +-
>  qapi-schema.json   |8 +-
>  qmp.c      |4 +-
>  stubs/Makefile.objs|2 +-
>  ...c_dimm_device_list.c => qmp_dimm_device_list.c} |4 +-
>  target-ppc/kvm.c   |   21 +-
>  trace-events   |8 +-
>  util/oslib-posix.c |   16 +
>  util/oslib-win32.c |5 +
>  43 files changed, 2224 insertions(+), 838 deletions(-)
>  create mode 100644 docs/specs/acpi_nvdimm.txt
>  create mode 100644 hw/acpi/nvdimm.c
>  rename hw/mem/{pc-dimm.c => dimm.c} (65%)
>  create mode 100644 hw/mem/nvdimm.c
>  rewrite hw/mem/pc-dimm.c (91%)
>  create mode 100644 include/hw/mem/dimm.h
>  create mode 100644 include/hw/mem/nvdimm.h
>  rewrite include/hw/mem/pc-dimm.h (97%)
>  rename stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} (56%)

I've reviewed the interface that ACPI inside the guest uses to
communicate with QEMU.  I haven't reviewed the actual ACPI generation or
pc-dimm device model parts.

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH v6 27/33] nvdimm acpi: support function 0

2015-10-30 Thread Stefan Hajnoczi
On Fri, Oct 30, 2015 at 01:56:21PM +0800, Xiao Guangrong wrote:
>  static uint64_t
>  nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>  {
> -return 0;
> +AcpiNVDIMMState *state = opaque;
> +MemoryRegion *dsm_ram_mr = >ram_mr;
> +NvdimmDsmIn *in;
> +GArray *out;
> +void *dsm_ram_addr;
> +uint32_t buf_size;
> +
> +assert(memory_region_size(dsm_ram_mr) >= sizeof(NvdimmDsmIn));
> +dsm_ram_addr = memory_region_get_ram_ptr(dsm_ram_mr);
> +
> +/*
> + * The DSM memory is mapped to guest address space so an evil guest
> + * can change its content while we are doing DSM emulation. Avoid
> + * this by copying DSM memory to QEMU local memory.
> + */
> +in = g_malloc(memory_region_size(dsm_ram_mr));
> +memcpy(in, dsm_ram_addr, memory_region_size(dsm_ram_mr));
> +
> +le32_to_cpus(>revision);
> +le32_to_cpus(>function);
> +le32_to_cpus(>handle);
> +
> +nvdimm_debug("Revision %#x Handler %#x Function %#x.\n", in->revision,
> + in->handle, in->function);
> +
> +out = g_array_new(false, true /* clear */, 1);
> +
> +if (in->revision != 0x1 /* Current we support DSM Spec Rev1. */) {
> +nvdimm_debug("Revision %#x is not supported, expect %#x.\n",
> +  in->revision, 0x1);
> +nvdimm_dsm_write_status(out, NVDIMM_DSM_STATUS_NOT_SUPPORTED);
> +goto exit;
> +}
> +
> +/* Handle 0 is reserved for NVDIMM Root Device. */
> +if (!in->handle) {
> +nvdimm_dsm_root(in, out);
> +goto exit;
> +}
> +
> +nvdimm_dsm_device(in, out);
> +
> +exit:
> +/* Write output result to dsm memory. */
> +memcpy(dsm_ram_addr, out->data, out->len);
> +memory_region_set_dirty(dsm_ram_mr, 0, out->len);

If you respin this series, please add this before the memcpy out:

  assert(out->len <= memory_region_size(dsm_ram_mr))

That way we can catch situations where too much output data was
generated by mistake.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 28/33] nvdimm acpi: support Get Namespace Label Size function

2015-10-29 Thread Stefan Hajnoczi
On Thu, Oct 29, 2015 at 10:16:14AM +0800, Xiao Guangrong wrote:
> 
> 
> On 10/29/2015 12:41 AM, Stefan Hajnoczi wrote:
> >On Wed, Oct 28, 2015 at 10:26:26PM +, Xiao Guangrong wrote:
> >>+struct nvdimm_func_in_get_label_data {
> >>+uint32_t offset; /* the offset in the namespace label data area. */
> >>+uint32_t length; /* the size of data is to be read via the function. */
> >>+} QEMU_PACKED;
> >>+typedef struct nvdimm_func_in_get_label_data nvdimm_func_in_get_label_data;
> >
> >./CODING_STYLE "3. Naming":
> >
> >   Structured type names are in CamelCase; harder to type but standing
> >   out.
> 
> Did not realize it before. Will change its name to:
> NVDIMMFuncInGetLabelData

Great, thanks!

> >>+/*
> >>+ * the max transfer size is the max size transferred by both a
> >>+ * 'Get Namespace Label Data' function and a 'Set Namespace Label Data'
> >>+ * function.
> >>+ */
> >>+static uint32_t nvdimm_get_max_xfer_label_size(void)
> >>+{
> >>+nvdimm_dsm_in *in;
> >>+uint32_t max_get_size, max_set_size, dsm_memory_size = getpagesize();
> >
> >Why is the host's page size relevant here?  Did you mean
> >TARGET_PAGE_SIZE?
> 
> Yes.
> 
> NVDIMM is the common code, unfortunately TARGET_PAGE_SIZE is platform
> specified and QEMU lacks a place to include this kind of specified definition:

Can you make NVDIMM a per-target object file?

Although we try to avoid it whenever possible, it means that
qemu-system-x86_64, qemu-system-arm, etc will build
x86_64-softmmu/hw/.../nvdimm.o, arm-softmmu/hw/.../nvdimm.o, etc.

In Makefile.objs put the nvdimm object file in obj-y instead of
common-obj-y.


signature.asc
Description: PGP signature


Re: [PATCH v5 27/33] nvdimm acpi: support function 0

2015-10-28 Thread Stefan Hajnoczi
On Wed, Oct 28, 2015 at 10:26:25PM +, Xiao Guangrong wrote:
> __DSM is defined in ACPI 6.0: 9.14.1 _DSM (Device Specific Method)
> 
> Function 0 is a query function. We do not support any function on root
> device and only 3 functions are support for NVDIMM device, Get Namespace
> Label Size, Get Namespace Label Data and Set Namespace Label Data, that
> means we currently only allow to access device's Label Namespace
> 
> Signed-off-by: Xiao Guangrong <guangrong.x...@linux.intel.com>
> ---
>  hw/acpi/aml-build.c |   2 +-
>  hw/acpi/nvdimm.c| 156 
> +++-
>  include/hw/acpi/aml-build.h |   1 +
>  3 files changed, 157 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH v5 30/33] nvdimm acpi: support Set Namespace Label Data function

2015-10-28 Thread Stefan Hajnoczi
On Wed, Oct 28, 2015 at 10:26:28PM +, Xiao Guangrong wrote:
> +static void nvdimm_dsm_func_set_label_data(NVDIMMDevice *nvdimm,
> +   nvdimm_dsm_in *in, GArray *out)
> +{
> +NVDIMMClass *nvc = NVDIMM_GET_CLASS(nvdimm);
> +nvdimm_func_in_set_label_data *set_label_data = >func_set_label_data;
> +uint32_t status;
> +
> +le32_to_cpus(_label_data->offset);
> +le32_to_cpus(_label_data->length);
> +
> +nvdimm_debug("Write Label Data: offset %#x length %#x.\n",
> + set_label_data->offset, set_label_data->length);
> +
> +if (nvdimm->label_size < set_label_data->offset + 
> set_label_data->length) {

Integer overflow.


signature.asc
Description: PGP signature


Re: [PATCH v5 29/33] nvdimm acpi: support Get Namespace Label Data function

2015-10-28 Thread Stefan Hajnoczi
On Wed, Oct 28, 2015 at 10:26:27PM +, Xiao Guangrong wrote:
> +static void nvdimm_dsm_func_get_label_data(NVDIMMDevice *nvdimm,
> +   nvdimm_dsm_in *in, GArray *out)
> +{
> +NVDIMMClass *nvc = NVDIMM_GET_CLASS(nvdimm);
> +nvdimm_func_in_get_label_data *get_label_data = >func_get_label_data;
> +void *buf;
> +uint32_t status = NVDIMM_DSM_STATUS_SUCCESS;
> +
> +le32_to_cpus(_label_data->offset);
> +le32_to_cpus(_label_data->length);
> +
> +nvdimm_debug("Read Label Data: offset %#x length %#x.\n",
> + get_label_data->offset, get_label_data->length);
> +
> +if (nvdimm->label_size < get_label_data->offset + 
> get_label_data->length) {

Integer overflow isn't handled here and it's unclear if that can cause
problems later on.  It's safest to catch it right away instead of
relying on nvc->read_label_data() to check again.


signature.asc
Description: PGP signature


Re: [PATCH v5 28/33] nvdimm acpi: support Get Namespace Label Size function

2015-10-28 Thread Stefan Hajnoczi
On Wed, Oct 28, 2015 at 10:26:26PM +, Xiao Guangrong wrote:
> +struct nvdimm_func_in_get_label_data {
> +uint32_t offset; /* the offset in the namespace label data area. */
> +uint32_t length; /* the size of data is to be read via the function. */
> +} QEMU_PACKED;
> +typedef struct nvdimm_func_in_get_label_data nvdimm_func_in_get_label_data;

./CODING_STYLE "3. Naming":

  Structured type names are in CamelCase; harder to type but standing
  out.

I'm surprised that scripts/checkpatch.pl didn't warning about this.

> +/*
> + * the max transfer size is the max size transferred by both a
> + * 'Get Namespace Label Data' function and a 'Set Namespace Label Data'
> + * function.
> + */
> +static uint32_t nvdimm_get_max_xfer_label_size(void)
> +{
> +nvdimm_dsm_in *in;
> +uint32_t max_get_size, max_set_size, dsm_memory_size = getpagesize();

Why is the host's page size relevant here?  Did you mean
TARGET_PAGE_SIZE?


signature.asc
Description: PGP signature


Re: virtio assisted migration

2015-10-28 Thread Stefan Hajnoczi
On Sat, Oct 24, 2015 at 05:21:01PM +0800, Dave Young wrote:
> * block device
> For block storage migration, the problem is similar as memory. The original
> migration does not consider storage usage ratio it just copy all the sectors.

This is equivalent to issuing discard requests.  File systems can
already do that.

The block/mirror.c code checks if sectors are allocated so this should
be possible to achieve this today (with guest cooperation).

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v4 28/33] nvdimm acpi: support DSM_FUN_IMPLEMENTED function

2015-10-21 Thread Stefan Hajnoczi
On Wed, Oct 21, 2015 at 12:26:35AM +0800, Xiao Guangrong wrote:
> 
> 
> On 10/20/2015 11:51 PM, Stefan Hajnoczi wrote:
> >On Mon, Oct 19, 2015 at 08:54:14AM +0800, Xiao Guangrong wrote:
> >>+exit:
> >>+/* Write our output result to dsm memory. */
> >>+((dsm_out *)dsm_ram_addr)->len = out->len;
> >
> >Missing byteswap?
> >
> >I thought you were going to remove this field because it wasn't needed
> >by the guest.
> >
> 
> The @len is the size of _DSM result buffer, for example, for the function of
> DSM_FUN_IMPLEMENTED the result buffer is 8 bytes, and for
> DSM_DEV_FUN_NAMESPACE_LABEL_SIZE the buffer size is 4 bytes. It tells ASL code
> how much size of memory we need to return to the _DSM caller.
> 
> In _DSM code, it's handled like this:
> 
> "RLEN" is @len, “OBUF” is the left memory in DSM page.
> 
> /* get @len*/
> aml_append(method, aml_store(aml_name("RLEN"), aml_local(6)));
> /* @len << 3 to get bits. */
> aml_append(method, aml_store(aml_shiftleft(aml_local(6),
>aml_int(3)), aml_local(6)));
> 
> /* get @len << 3 bits from OBUF, and return it to the caller. */
> aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
> aml_local(6) , "OBUF"));
> 
> Since @len is our internally used, it's not return to guest, so i did not do
> byteswap here.

I am not familiar with the ACPI details, but I think this emits bytecode
that will be run by the guest's ACPI interpreter?

You still need to define the endianness of fields since QEMU and the
guest could have different endianness.

In other words, will the following work if a big-endian ppc host is
running a little-endian x86 guest?

  ((dsm_out *)dsm_ram_addr)->len = out->len;

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 28/33] nvdimm acpi: support DSM_FUN_IMPLEMENTED function

2015-10-20 Thread Stefan Hajnoczi
On Mon, Oct 19, 2015 at 08:54:14AM +0800, Xiao Guangrong wrote:
> +exit:
> +/* Write our output result to dsm memory. */
> +((dsm_out *)dsm_ram_addr)->len = out->len;

Missing byteswap?

I thought you were going to remove this field because it wasn't needed
by the guest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: just an observation about USB

2015-10-16 Thread Stefan Hajnoczi
On Wed, Oct 14, 2015 at 04:30:22PM -0400, Eric S. Johansson wrote:
> On 10/14/2015 04:04 PM, Paolo Bonzini wrote:
> >On 14/10/2015 21:39, Eric S. Johansson wrote:
> >>Latency is a bit longer than I like. USB and network connections break
> >>every time I come out of suspend part at least I don't have to use
> >>Windows all the time.
> >>
> >>  One thing is puzzling though. Windows, in idle, consume something like
> >>15 to 20% CPU according to top. I turn on NaturallySpeaking, the
> >>utilization climbs to him roughly 30 to 40%. I turn on the microphone
> >>and utilization jumps up to 80-110%.  In other words, it takes up a
> >>whole core.
> >USB is really expensive because it's all done through polling.  Do that
> >in hardware, and your computer is a bit hotter; do that in software
> >(that's what VMs do) and your computer doubles as a frying pan.
> >
> >If you have USB3 drivers in Windows, you can try using a USB3
> >controller.  But it's probably going to waste a lot of processing power
> >too, because USB audio uses a lot of small packets, making it basically
> >the worst case.
> 
>  Okay, then let's try to solve this a different way. What's the cleanest,
> lowest latency way of delivering audio to a virtual machine that doesn't use
> USB in the virtual machine?

QEMU can emulate PCI soundcards, including the Intel HD Audio codec
cards (-device intel-hda or -soundhw hda might do the trick).

Low latency and power consumption are usually at odds with each other.
That's because real-time audio requires small buffers many times per
second, so lots of interrupts and power consumption.

Anyway, PCI should be an improvement from USB audio.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

2015-10-15 Thread Stefan Hajnoczi
On Wed, Oct 14, 2015 at 10:50:40PM +0800, Xiao Guangrong wrote:
> On 10/14/2015 05:40 PM, Stefan Hajnoczi wrote:
> >On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:
> >>+out = (dsm_out *)in;
> >>+
> >>+revision = in->arg1;
> >>+function = in->arg2;
> >>+handle = in->handle;
> >>+le32_to_cpus();
> >>+le32_to_cpus();
> >>+le32_to_cpus();
> >>+
> >>+nvdebug("UUID " UUID_FMT ".\n", in->arg0[0], in->arg0[1], in->arg0[2],
> >>+in->arg0[3], in->arg0[4], in->arg0[5], in->arg0[6],
> >>+in->arg0[7], in->arg0[8], in->arg0[9], in->arg0[10],
> >>+in->arg0[11], in->arg0[12], in->arg0[13], in->arg0[14],
> >>+in->arg0[15]);
> >>+nvdebug("Revision %#x Function %#x Handler %#x.\n", revision, function,
> >>+handle);
> >>+
> >>+if (revision != DSM_REVISION) {
> >>+nvdebug("Revision %#x is not supported, expect %#x.\n",
> >>+revision, DSM_REVISION);
> >>+goto exit;
> >>+}
> >>+
> >>+if (!handle) {
> >>+if (!dsm_is_root_uuid(in->arg0)) {
> >
> >Please don't dereference 'in' or pass it to other functions.  Avoid race
> >conditions with guest vcpus by coping in the entire dsm_in struct.
> >
> >This is like a system call - the kernel cannot trust userspace memory
> >and must copy in before accessing data.  The same rules apply.
> >
> 
> It's little different for QEMU:
> - the memory address is always valid to QEMU, it's not always true for Kernel
>   due to context-switch
> 
> - we have checked the header before use it's data, for example, when we get
>   data from GET_NAMESPACE_DATA, we have got the @offset and @length from the
>   memory, then copy memory based on these values, that means the userspace
>   has no chance to cause buffer overflow by increasing these values at 
> runtime.
> 
>   The scenario for our case is simple but Kernel is difficult to do
>   check_all_before_use as many paths may be involved.
> 
> - guest changes some data is okay, the worst case is that the label data is
>   corrupted. This is caused by guest itself. Kernel also supports this kind
>   of behaviour, e,g. network TX zero copy, the userspace page is being
>   transferred while userspace can still access it.
> 
> - it's 4K size on x86, full copy wastes CPU time too much.

This isn't performance-critical code and I don't want to review it
keeping the race conditions in mind the whole time.  Also, if the code
is modified in the future, the chance of introducing a race is high.

I see this as premature optimization, please just copy in data.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

2015-10-15 Thread Stefan Hajnoczi
On Wed, Oct 14, 2015 at 10:52:15PM +0800, Xiao Guangrong wrote:
> On 10/14/2015 05:41 PM, Stefan Hajnoczi wrote:
> >On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:
> >>+out->len = sizeof(out->status);
> >
> >out->len is uint16_t, it needs cpu_to_le16().  There may be other
> >instances in this patch series.
> >
> 
> out->len is internally used only which is invisible to guest OS, i,e,
> we write this value and read this value by ourself. I think it is
> okay.

'out' points to guest memory.  Guest memory is untrusted so QEMU cannot
stash values there - an evil guest could modify them.

Please put the len variable on the QEMU stack or heap where the guest
cannot access it.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Inconsistent guest OS disk size compared to volume.img size

2015-10-14 Thread Stefan Hajnoczi
On Tue, Sep 29, 2015 at 12:02:17AM -0700, Jay Fishman wrote:
> I  have looked all over the internet but I can not even find a
> reference to this issue.
> 
> 
> I have installed the following on Linux Mint 17.1
> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.19), Fabrice Bellard
> 
> On that, I have created a Ubuntu 14.04.3 LTS guest and created a
> storage volume of 12.88GB. The format that I used was raw.
> 
> The host uses a physical mirrored drive and I did NOT use LVM (ext4
> was the format type)
> 
> When installing the guest, I selected to "use entire disk" and again I
> did NOT use LVM (ext4 was the format type)
> 
> 
> After installation, the guest reports I am using 23.8% of 4.84GB. Why
> is the disk size 4.84GB instead of 12.88GB?
> 
> The size of the guest virtual disk is being reduced by almost a third?

If you still need help with this, please provide the following
information:

1. Output of "fdisk -lu /dev/vda" and "df -h /" from inside the guest.

   You may need to adjust the block device path if the root file system
   isn't on the first virtio-blk device (e.g. /dev/vdb or /dev/sda).

2. Output of "stat disk.img" from the host, where "disk.img" is the
   filename.

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

2015-10-14 Thread Stefan Hajnoczi
On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:
>  static void dsm_write(void *opaque, hwaddr addr,
>uint64_t val, unsigned size)
>  {
> +NVDIMMState *state = opaque;
> +MemoryRegion *dsm_ram_mr;
> +dsm_in *in;
> +dsm_out *out;
> +uint32_t revision, function, handle;
> +
>  if (val != NOTIFY_VALUE) {
>  fprintf(stderr, "BUG: unexepected notify value 0x%" PRIx64, val);
>  }
> +
> +dsm_ram_mr = memory_region_find(>mr, state->page_size,
> +state->page_size).mr;
> +memory_region_unref(dsm_ram_mr);
> +in = memory_region_get_ram_ptr(dsm_ram_mr);

This looks suspicious.  Shouldn't the memory_region_unref(dsm_ram_mr)
happen after we're done using it?

> +out = (dsm_out *)in;
> +
> +revision = in->arg1;
> +function = in->arg2;
> +handle = in->handle;
> +le32_to_cpus();
> +le32_to_cpus();
> +le32_to_cpus();
> +
> +nvdebug("UUID " UUID_FMT ".\n", in->arg0[0], in->arg0[1], in->arg0[2],
> +in->arg0[3], in->arg0[4], in->arg0[5], in->arg0[6],
> +in->arg0[7], in->arg0[8], in->arg0[9], in->arg0[10],
> +in->arg0[11], in->arg0[12], in->arg0[13], in->arg0[14],
> +in->arg0[15]);
> +nvdebug("Revision %#x Function %#x Handler %#x.\n", revision, function,
> +handle);
> +
> +if (revision != DSM_REVISION) {
> +nvdebug("Revision %#x is not supported, expect %#x.\n",
> +revision, DSM_REVISION);
> +goto exit;
> +}
> +
> +if (!handle) {
> +if (!dsm_is_root_uuid(in->arg0)) {

Please don't dereference 'in' or pass it to other functions.  Avoid race
conditions with guest vcpus by coping in the entire dsm_in struct.

This is like a system call - the kernel cannot trust userspace memory
and must copy in before accessing data.  The same rules apply.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AIO requests may be disordered by Qemu-kvm iothread with disk cache=writethrough, Bug or Feature?

2015-10-14 Thread Stefan Hajnoczi
On Thu, Oct 08, 2015 at 07:59:56PM +0800, charlie.song wrote:
> We recently try to use Linux AIO from guest OS and find that the IOthread 
> mechanism of Qemu-KVM will reorder I/O requests from guest OS 
> even when the AIO write requests are issued from a single thread in order. 
> This does not happen on the host OS however.

I think you are describing a situation where a guest submits multiple
overlapping I/O requests at the same time.

virtio-blk does not guarantee a specific request ordering, so the
application needs to wait for request completion if ordering matters.

io_submit(2) also does not make guarantees about ordering.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

2015-10-14 Thread Stefan Hajnoczi
On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:
> +out->len = sizeof(out->status);

out->len is uint16_t, it needs cpu_to_le16().  There may be other
instances in this patch series.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QEMU Technical Talk: NVDIMM and persistent memory in QEMU

2015-10-12 Thread Stefan Hajnoczi
On Mon, Oct 5, 2015 at 8:52 PM, Stefan Hajnoczi <stefa...@gmail.com> wrote:

Just a reminder that QEMU's first technical talk is today (Monday, 12
October 2015) at 14:00 UTC.  We will be using Hangouts On Air for
video/audio.  The URL is:
https://plus.google.com/events/cfssoojfogaafulssb1qeijn07k

Full details below:

> Marc Mari has volunteered to give the following online technical talk
> on Monday, 12 October at 14:00 UTC:
>
> "Marc Mari will present the new NVDIMM persistent memory device class
> and how they integrate into QEMU and SeaBIOS.  The main concepts of
> the hardware specification are covered, as well as how NVDIMMs can be
> used by virtual machines.
>
> This talk is aimed at QEMU and SeaBIOS developers."
>
> Marc has been experimenting with Guangrong Xiao's NVDIMM patches and
> is working on SeaBIOS boot-from-NVDIMM support.
>
> To join the event:
> https://plus.google.com/events/cfssoojfogaafulssb1qeijn07k
>
> This is the first QEMU technical talk and we will be using Google+'s
> Hangouts On Air feature for a live presentation.  Video will also be
> archived on YouTube for viewing at a later date.
>
> If you would like to speak on a technical topic, please contact me!  I
> hope to host talks showcasing features of interest to QEMU users as
> well as technical topics for QEMU developers.
>
> Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QEMU Technical Talk: NVDIMM and persistent memory in QEMU

2015-10-12 Thread Stefan Hajnoczi
Thanks to everyone who joined and to Marc Mari for giving the
presentation.  The next QEMU technical talk will be announced to the
mailing list in a few days.

Video: https://www.youtube.com/watch?v=Vit3-PjbN9M#t=13m02s
Slides (PDF): http://vmsplice.net/~stefan/nvdimm_slides_public.pdf

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM

2015-10-09 Thread Stefan Hajnoczi
On Wed, Oct 07, 2015 at 10:43:40PM +0800, Xiao Guangrong wrote:
> 
> 
> On 10/07/2015 10:02 PM, Stefan Hajnoczi wrote:
> >On Wed, Aug 26, 2015 at 06:49:35PM +0800, Xiao Guangrong wrote:
> >>On 08/26/2015 12:26 AM, Stefan Hajnoczi wrote:
> >>>On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
> >>>Have you thought about live migration?
> >>>
> >>>Are the contents of the NVDIMM migrated since they are registered as a
> >>>RAM region?
> >>
> >>Will fully test live migration and VM save before sending the V3 out. :)
> >
> >Hi,
> >What is the status of this patch series?
> 
> This is huge change in v3, the patchset is ready now and it's being tested.
> Will post it out (hopefully this week) after the long holiday in China. :)

Great, thanks!

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM

2015-10-07 Thread Stefan Hajnoczi
On Wed, Aug 26, 2015 at 06:49:35PM +0800, Xiao Guangrong wrote:
> On 08/26/2015 12:26 AM, Stefan Hajnoczi wrote:
> >On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
> >Have you thought about live migration?
> >
> >Are the contents of the NVDIMM migrated since they are registered as a
> >RAM region?
> 
> Will fully test live migration and VM save before sending the V3 out. :)

Hi,
What is the status of this patch series?

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v4] os-android: Add support to android platform

2015-10-06 Thread Stefan Hajnoczi
On Sat, Oct 03, 2015 at 12:44:14PM +0800, Houcheng Lin wrote:
> diff --git a/configure b/configure
> index d7c24cd..cda88c1 100755
> --- a/configure
> +++ b/configure
> @@ -567,7 +567,6 @@ fi
>  
>  # host *BSD for user mode
>  HOST_VARIANT_DIR=""
> -
>  case $targetos in
>  CYGWIN*)
>mingw32="yes"

Spurious whitespace change

> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index b1beaa6..44beee3 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -22,7 +22,6 @@
>   */
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 

What is the justification for this?  Do you know why io.h was included
before?

> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index ab3c876..9e26d10 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -74,6 +74,14 @@ typedef unsigned intuint_fast16_t;
>  typedef signed int  int_fast16_t;
>  #endif
>  
> +#ifdef CONFIG_ANDROID
> +/*
> + * For include the basename prototyping in android.
> + */
> +#include 

Files that use basename(3) should include libgen.h.  Why include it
here?

> +#define IOV_MAX 1024

Are you sure that Android NDK headers do not contain this constant?

> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 3ae4987..4ae746b 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -62,6 +62,8 @@ extern int daemon(int, int);
>  #include 
>  #include 
>  #include 
> +#include 

Why did you include time.h?

> +#include 
>  
>  #ifdef CONFIG_LINUX
>  #include 
> @@ -482,3 +484,17 @@ int qemu_read_password(char *buf, int buf_size)
>  printf("\n");
>  return ret;
>  }
> +
> +int qemu_getdtablesize(void)
> +{
> +#ifdef CONFIG_ANDROID
> +struct rlimit r;
> +
> +if (getrlimit(RLIMIT_NOFILE, ) < 0) {
> +return sysconf(_SC_OPEN_MAX);
> +}
> +return r.rlim_cur;
> +#else
> +return getdtablesize();
> +#endif
> +}

We can probably drop the getdtablesize() call completely and use the
CONFIG_ANDROID code on all platforms.  I suggest splitting this out into
a separate patch that introduces qemu_getdtablesize() and converts all
callers.

> diff --git a/util/qemu-openpty.c b/util/qemu-openpty.c
> index 4c53211..b305886 100644
> --- a/util/qemu-openpty.c
> +++ b/util/qemu-openpty.c
> @@ -51,12 +51,17 @@
>  # include 
>  #endif
>  
> -#ifdef __sun__
> +#if defined(__sun__) || defined(CONFIG_ANDROID)
> +
>  /* Once Solaris has openpty(), this is going to be removed. */
>  static int openpty(int *amaster, int *aslave, char *name,
> struct termios *termp, struct winsize *winp)
>  {
> +#if defined(CONFIG_ANDROID)
> +char slave[PATH_MAX];
> +#else
>  const char *slave;
> +#endif
>  int mfd = -1, sfd = -1;
>  
>  *amaster = *aslave = -1;
> @@ -67,17 +72,22 @@ static int openpty(int *amaster, int *aslave, char *name,
>  
>  if (grantpt(mfd) == -1 || unlockpt(mfd) == -1)
>  goto err;
> -
> +#if defined(CONFIG_ANDROID)
> +if (ptsname_r(mfd, slave, PATH_MAX) < 0)
> +goto err;
> +#else
>  if ((slave = ptsname(mfd)) == NULL)
>  goto err;
> +#endif

ptsname_r(3) should be used on all Linux hosts because it is reentrant.
This improvement isn't Android-specific, please split it into a separate
patch.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Who wants to mentor for Outreachy Dec-Mar?

2015-09-09 Thread Stefan Hajnoczi
We are now looking for mentors for the next round of Outreachy running
from December 7, 2015 to March 7, 2016.  I have set up a wiki page
here:
http://qemu-project.org/Outreachy_2015_DecemberMarch

Our communities have participated in previous years to mentor people
from underrepresented groups and help them get involved in open source
software.  To learn more, see the Outreachy website:
https://www.gnome.org/outreachy/

If you are a regular contributor to QEMU, libvirt, or the KVM kernel
module then you can become an Outreachy mentor.  Information on what's
involved is here:
https://wiki.gnome.org/Outreachy/Admin/InfoForMentors

Mentoring summary:

1. Post your project ideas here:
http://qemu-project.org/Outreachy_2015_DecemberMarch

2. You give each applicant a different small task so they can submit a
patch upstream.  You also interview promising candidates on IRC to get
a better picture.  Then you select a candidate you wish to work with
(or none).

3. Requires 5 hours/week from December 2015 to March 2016 to mentor
your intern, review their code, answer their questions, etc.

If you'd like to become a mentor, please let me know.


We are also looking for sponsors who wish to fund Outreachy interns
for QEMU, libvirt, and the KVM kernel module.  The sponsorship for one
intern is $6,500.  Learn more about sponsorship:
https://wiki.gnome.org/Outreachy/Admin/InfoForOrgs

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Who wants to mentor for Outreachy Dec-Mar?

2015-09-09 Thread Stefan Hajnoczi
On Wed, Sep 9, 2015 at 12:59 PM, Michal Privoznik <mpriv...@redhat.com> wrote:
> On 09.09.2015 12:28, Stefan Hajnoczi wrote:
>> We are now looking for mentors for the next round of Outreachy running
>> from December 7, 2015 to March 7, 2016.  I have set up a wiki page
>> here:
>> http://qemu-project.org/Outreachy_2015_DecemberMarch
>
> I've copied over unused projects from GSoC which I'm willing to mentor.
> Others are welcomed to mentor too ;-)

Great, since there is interest I will start looking if we can secure
sponsorship.

We need to find sponsors for the Outreachy interns that we take.
Unlike GSoC, there is not automatically funding available for all
interns.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: Security Policy

2015-09-02 Thread Stefan Hajnoczi
On Thu, Aug 27, 2015 at 02:01:52PM +0200, Stefan Geißler wrote:
> Hello kvm mailing list,
> 
> I assume, this is a rather uncommon mailing list post since it is not
> directly related to the usage or development of KVM. Instead, the following
> is the case:
> 
> I am a student of computer science and am currently working on my masters
> thesis. The work in progress topic is "Mining vulnerability databases for
> information on hypervisor vulnerabilities: Analyses and Predictions". In the
> context of this research work i am analyzing various security related
> aspects regarding different hypervisors including KVM (A simple example
> contained in my analysis is the discovery process of security
> vulnerabilities and how the total number of disclosed vulnerabilities
> developes over time).
> 
> The reason i am writing this post to the public mailing list is, that i am
> looking for someone who might be willing to support me during my work with
> (for example) information and/or personal experience. Or simply said: May i
> post questions and ask for help explaining my findings from time to time or
> is this too much off-topic for this mailing list?

It's not off-topic.  I think it's in the interest of the community so
don't be afraid to engage the mailing list with your questions or
feedback on your findings.

> For now the question would be, whether there is some kind of a formal
> documentation of the vulnerability disclosure process or a security policy
> specific for KVM?

The kvm kernel module is part of Linux and there is a process for that:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SecurityBugs?id=HEAD

The QEMU emulator does device emulation in userspace is a separate
project (used by KVM and Xen).  It has its own security process here:
http://qemu-project.org/SecurityProcess

> If someone has any information regarding this, feel free to contact me
> directly through my personal mail address. Any help and information will be
> greatly appreciated!

Let's keep discussion on the mailing list (CC kvm@vger.kernel.org).
That way others can participate and it becomes archived/searchable.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function

2015-09-01 Thread Stefan Hajnoczi
On Mon, Aug 31, 2015 at 02:51:50PM +0800, Xiao Guangrong wrote:
> 
> 
> On 08/28/2015 08:01 PM, Stefan Hajnoczi wrote:
> >On Wed, Aug 26, 2015 at 06:46:35PM +0800, Xiao Guangrong wrote:
> >>On 08/26/2015 12:23 AM, Stefan Hajnoczi wrote:
> >>>On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote:
> >>>>  static void dsm_write(void *opaque, hwaddr addr,
> >>>>uint64_t val, unsigned size)
> >>>>  {
> >>>>+struct MemoryRegion *dsm_ram_mr = opaque;
> >>>>+struct dsm_buffer *dsm;
> >>>>+struct dsm_out *out;
> >>>>+void *buf;
> >>>>+
> >>>>  assert(val == NOTIFY_VALUE);
> >>>
> >>>The guest should not be able to cause an abort(3).  If val !=
> >>>NOTIFY_VALUE we can do nvdebug() and then return.
> >>
> >>The ACPI code and emulation code both are from qemu, if that happens,
> >>it's really a bug, aborting the VM is better than throwing a debug
> >>message under this case to avoid potential data corruption.
> >
> >abort(3) is dangerous because it can create a core dump.  If a malicious
> >guest triggers this repeatedly it could consume a lot of disk space and
> >I/O or CPU while performing the core dumps.
> >
> >We cannot trust anything inside the guest, even if the guest code comes
> >from QEMU because a malicious guest can still read/write to the same
> >hardware registers.
> >
> 
> Completely agree with you. :)
> 
> How about use exit{1} instead of abort() to kill the VM?

Most devices on a physical machine do not power off or reset the machine
in case of error.

I think it's good to follow that model and avoid killing the VM.
Otherwise nested virtualization or userspace drivers can take down the
whole VM.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-09-01 Thread Stefan Hajnoczi
On Mon, Aug 31, 2015 at 02:23:43PM +0800, Xiao Guangrong wrote:
> 
> Hi Stefan,
> 
> On 08/28/2015 07:58 PM, Stefan Hajnoczi wrote:
> 
> >
> >>>>+goto do_unmap;
> >>>>+}
> >>>>+
> >>>>+nvdimm->device_index = new_device_index();
> >>>>+sprintf(name, "NVDIMM-%d", nvdimm->device_index);
> >>>>+memory_region_init_ram_ptr(>mr, OBJECT(dev), name, 
> >>>>nvdimm_size,
> >>>>+   buf);
> >>>
> >>>How is the autogenerated name used?
> >>>
> >>>Why not just use "pc-nvdimm.memory"?
> >>
> >>Ah. Just for debug proposal :) and i am not sure if a name used for multiple
> >>MRs (MemoryRegion) is a good idea.
> >
> >Other devices use a constant name too (git grep
> >memory_region_init_ram_ptr) so it seems to be okay.  The unique thing is
> >the OBJECT(dev) which differs for each NVDIMM instance.
> >
> 
> When I was digging into live migration code, i noticed that the same MR name 
> may
> cause the name "idstr", please refer to qemu_ram_set_idstr().
> 
> Since nvdimm devices do not have parent-bus, it will trigger the abort() in 
> that
> function.

I see.  The other devices that use a constant name are on a bus so the
abort doesn't trigger.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data

2015-08-28 Thread Stefan Hajnoczi
On Wed, Aug 26, 2015 at 06:42:01PM +0800, Xiao Guangrong wrote:
 
 
 On 08/26/2015 12:16 AM, Stefan Hajnoczi wrote:
 On Fri, Aug 14, 2015 at 10:52:06PM +0800, Xiao Guangrong wrote:
 +#ifdef NVDIMM_DEBUG
 +#define nvdebug(fmt, ...) fprintf(stderr, nvdimm:  fmt, ## __VA_ARGS__)
 +#else
 +#define nvdebug(...)
 +#endif
 
 The following allows the compiler to check format strings and syntax
 check the argument expressions:
 
 #define NVDIMM_DEBUG 0  /* set to 1 for debug output */
 #define nvdebug(fmt, ...) \
  if (NVDIMM_DEBUG) { \
  fprintf(stderr, nvdimm:  fmt, ## __VA_ARGS__); \
  }
 
 This approach avoids bitrot (e.g. debug format string arguments have
 become outdated).
 
 
 Really good tips, thanks for your sharing.

I forgot the do { ... } while (0) in the macro to make nvdebug(hello
world); work like a normal C statement.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function

2015-08-28 Thread Stefan Hajnoczi
On Wed, Aug 26, 2015 at 06:46:35PM +0800, Xiao Guangrong wrote:
 On 08/26/2015 12:23 AM, Stefan Hajnoczi wrote:
 On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote:
   static void dsm_write(void *opaque, hwaddr addr,
 uint64_t val, unsigned size)
   {
 +struct MemoryRegion *dsm_ram_mr = opaque;
 +struct dsm_buffer *dsm;
 +struct dsm_out *out;
 +void *buf;
 +
   assert(val == NOTIFY_VALUE);
 
 The guest should not be able to cause an abort(3).  If val !=
 NOTIFY_VALUE we can do nvdebug() and then return.
 
 The ACPI code and emulation code both are from qemu, if that happens,
 it's really a bug, aborting the VM is better than throwing a debug
 message under this case to avoid potential data corruption.

abort(3) is dangerous because it can create a core dump.  If a malicious
guest triggers this repeatedly it could consume a lot of disk space and
I/O or CPU while performing the core dumps.

We cannot trust anything inside the guest, even if the guest code comes
from QEMU because a malicious guest can still read/write to the same
hardware registers.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-08-28 Thread Stefan Hajnoczi
On Wed, Aug 26, 2015 at 06:40:26PM +0800, Xiao Guangrong wrote:
 On 08/26/2015 12:03 AM, Stefan Hajnoczi wrote:
 On Fri, Aug 14, 2015 at 10:52:01PM +0800, Xiao Guangrong wrote:
 
 +if (fd  0) {
 +error_setg(errp, can not open %s, nvdimm-file);
 
 s/can not/cannot/
 
 +return;
 +}
 +
 +size = get_file_size(fd);
 +buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 
 I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE.
 This can be added in the future.
 
 Good idea, it will allow guest to write data but discards its content after it
 exits. Will implement O_RDONLY + MAP_PRIVATE in the near future.

Great.

 +goto do_unmap;
 +}
 +
 +nvdimm-device_index = new_device_index();
 +sprintf(name, NVDIMM-%d, nvdimm-device_index);
 +memory_region_init_ram_ptr(nvdimm-mr, OBJECT(dev), name, nvdimm_size,
 +   buf);
 
 How is the autogenerated name used?
 
 Why not just use pc-nvdimm.memory?
 
 Ah. Just for debug proposal :) and i am not sure if a name used for multiple
 MRs (MemoryRegion) is a good idea.

Other devices use a constant name too (git grep
memory_region_init_ram_ptr) so it seems to be okay.  The unique thing is
the OBJECT(dev) which differs for each NVDIMM instance.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote:
 @@ -306,6 +354,18 @@ struct dsm_buffer {
  static ram_addr_t dsm_addr;
  static size_t dsm_size;
  
 +struct cmd_out_implemented {

QEMU coding style uses typedef struct {} CamelCase.  Please follow this
convention in all user-defined structs (see ./CODING_STYLE).

  static void dsm_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
  {
 +struct MemoryRegion *dsm_ram_mr = opaque;
 +struct dsm_buffer *dsm;
 +struct dsm_out *out;
 +void *buf;
 +
  assert(val == NOTIFY_VALUE);

The guest should not be able to cause an abort(3).  If val !=
NOTIFY_VALUE we can do nvdebug() and then return.

 +
 +buf = memory_region_get_ram_ptr(dsm_ram_mr);
 +dsm = buf;
 +out = buf;
 +
 +le32_to_cpus(dsm-handle);
 +le32_to_cpus(dsm-arg1);
 +le32_to_cpus(dsm-arg2);

Can SMP guests modify DSM RAM while this thread is running?

We must avoid race conditions.  It's probably better to copy in data
before byte-swapping or checking input values.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
 Changlog:
 - Use litten endian for DSM method, thanks for Stefan's suggestion
 
 - introduce a new parameter, @configdata, if it's false, Qemu will
   build a static and readonly namespace in memory and use it serveing
   for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
   reserved region is needed at the end of the @file, it is good for
   the user who want to pass whole nvdimm device and make its data
   completely be visible to guest
 
 - divide the source code into separated files and add maintain info

I have skipped ACPI patches because I'm not very familiar with that
area.

Have you thought about live migration?

Are the contents of the NVDIMM migrated since they are registered as a
RAM region?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/18] nvdimm: init the address region used by DSM method

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:03PM +0800, Xiao Guangrong wrote:
 @@ -257,14 +258,91 @@ static void build_nfit_table(GSList *device_list, char 
 *buf)
  }
  }
  
 +struct dsm_buffer {
 +/* RAM page. */
 +uint32_t handle;
 +uint8_t arg0[16];
 +uint32_t arg1;
 +uint32_t arg2;
 +union {
 +char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
 +};
 +
 +/* MMIO page. */
 +union {
 +uint32_t notify;
 +char pedding[PAGE_SIZE];

s/pedding/padding/

 +};
 +};
 +
 +static ram_addr_t dsm_addr;
 +static size_t dsm_size;
 +
 +static uint64_t dsm_read(void *opaque, hwaddr addr,
 + unsigned size)
 +{
 +return 0;
 +}
 +
 +static void dsm_write(void *opaque, hwaddr addr,
 +  uint64_t val, unsigned size)
 +{
 +}
 +
 +static const MemoryRegionOps dsm_ops = {
 +.read = dsm_read,
 +.write = dsm_write,
 +.endianness = DEVICE_LITTLE_ENDIAN,
 +};
 +
 +static int build_dsm_buffer(void)
 +{
 +MemoryRegion *dsm_ram_mr, *dsm_mmio_mr;
 +ram_addr_t addr;;

s/;;/;/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:06PM +0800, Xiao Guangrong wrote:
 +#ifdef NVDIMM_DEBUG
 +#define nvdebug(fmt, ...) fprintf(stderr, nvdimm:  fmt, ## __VA_ARGS__)
 +#else
 +#define nvdebug(...)
 +#endif

The following allows the compiler to check format strings and syntax
check the argument expressions:

#define NVDIMM_DEBUG 0  /* set to 1 for debug output */
#define nvdebug(fmt, ...) \
if (NVDIMM_DEBUG) { \
fprintf(stderr, nvdimm:  fmt, ## __VA_ARGS__); \
}

This approach avoids bitrot (e.g. debug format string arguments have
become outdated).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:08PM +0800, Xiao Guangrong wrote:
 Function 4 is used to get Namespace lable size

s/lable/label/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:51:59PM +0800, Xiao Guangrong wrote:
 +static void set_file(Object *obj, const char *str, Error **errp)
 +{
 +PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
 +
 +if (nvdimm-file) {
 +g_free(nvdimm-file);
 +}

g_free(NULL) is a nop so it's safe to replace the if with just
g_free(nvdimm-file).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
 diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
 index a53d235..7a270a8 100644
 --- a/hw/mem/nvdimm/pc-nvdimm.c
 +++ b/hw/mem/nvdimm/pc-nvdimm.c
 @@ -24,6 +24,19 @@
  
  #include hw/mem/pc-nvdimm.h
  
 +#define PAGE_SIZE  (1UL  12)

This macro name is likely to collide with system headers or other code.

Could you use the existing TARGET_PAGE_SIZE constant instead?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:01PM +0800, Xiao Guangrong wrote:
 The parameter @file is used as backed memory for NVDIMM which is
 divided into two parts if @dataconfig is true:

s/dataconfig/configdata/

 @@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj)
   set_configdata, NULL);
  }
  
 +static uint64_t get_file_size(int fd)
 +{
 +struct stat stat_buf;
 +uint64_t size;
 +
 +if (fstat(fd, stat_buf)  0) {
 +return 0;
 +}
 +
 +if (S_ISREG(stat_buf.st_mode)) {
 +return stat_buf.st_size;
 +}
 +
 +if (S_ISBLK(stat_buf.st_mode)  !ioctl(fd, BLKGETSIZE64, size)) {
 +return size;
 +}

#ifdef __linux__ for ioctl(fd, BLKGETSIZE64, size)?

There is nothing Linux-specific about emulating NVDIMMs so this code
should compile on all platforms.

 +
 +return 0;
 +}
 +
  static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
  {
  PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
 +char name[512];
 +void *buf;
 +ram_addr_t addr;
 +uint64_t size, nvdimm_size, config_size = MIN_CONFIG_DATA_SIZE;
 +int fd;
  
  if (!nvdimm-file) {
  error_setg(errp, file property is not set);
  }

Missing return here.

 +
 +fd = open(nvdimm-file, O_RDWR);

Does it make sense to support read-only NVDIMMs?

It could be handy for sharing a read-only file between unprivileged
guests.  The permissions on the file would only allow read, not write.

 +if (fd  0) {
 +error_setg(errp, can not open %s, nvdimm-file);

s/can not/cannot/

 +return;
 +}
 +
 +size = get_file_size(fd);
 +buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE.
This can be added in the future.

 +if (buf == MAP_FAILED) {
 +error_setg(errp, can not do mmap on %s, nvdimm-file);
 +goto do_close;
 +}
 +
 +nvdimm-config_data_size = config_size;
 +if (nvdimm-configdata) {
 +/* reserve MIN_CONFIGDATA_AREA_SIZE for configue data. */
 +nvdimm_size = size - config_size;
 +nvdimm-config_data_addr = buf + nvdimm_size;
 +} else {
 +nvdimm_size = size;
 +nvdimm-config_data_addr = NULL;
 +}
 +
 +if ((int64_t)nvdimm_size = 0) {

The error cases can be detected before mmap(2).  That avoids the int64_t
cast and also avoids nvdimm_size underflow and the bogus
nvdimm-config_data_addr calculation above.

size = get_file_size(fd);
if (size == 0) {
error_setg(errp, empty file or unable to get file size);
goto do_close;
} else if (nvdimm-configdata  size  config_size) {{
error_setg(errp, file size is too small to store NVDIMM
  configure data);
goto do_close;
}

 +error_setg(errp, file size is too small to store NVDIMM
 +  configure data);
 +goto do_unmap;
 +}
 +
 +addr = reserved_range_push(nvdimm_size);
 +if (!addr) {
 +error_setg(errp, do not have enough space for size %#lx.\n, size);

error_setg() messages must not have a newline at the end.

Please use %# PRIx64 instead of %#lx so compilation works on 32-bit
hosts where sizeof(long) == 4.

 +goto do_unmap;
 +}
 +
 +nvdimm-device_index = new_device_index();
 +sprintf(name, NVDIMM-%d, nvdimm-device_index);
 +memory_region_init_ram_ptr(nvdimm-mr, OBJECT(dev), name, nvdimm_size,
 +   buf);

How is the autogenerated name used?

Why not just use pc-nvdimm.memory?

 +vmstate_register_ram(nvdimm-mr, DEVICE(dev));
 +memory_region_add_subregion(get_system_memory(), addr, nvdimm-mr);
 +
 +return;

fd is leaked.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM

2015-08-25 Thread Stefan Hajnoczi
On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote:
 NVDIMM reserves all the free range above 4G to do:
 - Persistent Memory (PMEM) mapping
 - implement NVDIMM ACPI device _DSM method
 
 Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
 ---
  hw/i386/pc.c   | 12 ++--
  hw/mem/nvdimm/pc-nvdimm.c  | 13 +
  include/hw/mem/pc-nvdimm.h |  1 +
  3 files changed, 24 insertions(+), 2 deletions(-)

CCing Igor for memory hotplug-related changes.

 diff --git a/hw/i386/pc.c b/hw/i386/pc.c
 index 7661ea9..41af6ea 100644
 --- a/hw/i386/pc.c
 +++ b/hw/i386/pc.c
 @@ -64,6 +64,7 @@
  #include hw/pci/pci_host.h
  #include acpi-build.h
  #include hw/mem/pc-dimm.h
 +#include hw/mem/pc-nvdimm.h
  #include qapi/visitor.h
  #include qapi-visit.h
  
 @@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
  MemoryRegion *ram_below_4g, *ram_above_4g;
  FWCfgState *fw_cfg;
  PCMachineState *pcms = PC_MACHINE(machine);
 +ram_addr_t offset;
  
  assert(machine-ram_size == below_4g_mem_size + above_4g_mem_size);
  
 @@ -1339,6 +1341,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
  exit(EXIT_FAILURE);
  }
  
 +offset = 0x1ULL + above_4g_mem_size;
 +
  /* initialize hotplug memory address space */
  if (guest_info-has_reserved_memory 
  (machine-ram_size  machine-maxram_size)) {
 @@ -1358,8 +1362,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
  exit(EXIT_FAILURE);
  }
  
 -pcms-hotplug_memory.base =
 -ROUND_UP(0x1ULL + above_4g_mem_size, 1ULL  30);
 +pcms-hotplug_memory.base = ROUND_UP(offset, 1ULL  30);
  
  if (pcms-enforce_aligned_dimm) {
  /* size hotplug region assuming 1G page max alignment per slot */
 @@ -1377,8 +1380,13 @@ FWCfgState *pc_memory_init(MachineState *machine,
 hotplug-memory, hotplug_mem_size);
  memory_region_add_subregion(system_memory, pcms-hotplug_memory.base,
  pcms-hotplug_memory.mr);
 +
 +offset = pcms-hotplug_memory.base + hotplug_mem_size;
  }
  
 + /* all the space left above 4G is reserved for NVDIMM. */
 +pc_nvdimm_reserve_range(offset);
 +
  /* Initialize PC system firmware */
  pc_system_firmware_init(rom_memory, guest_info-isapc_ram_fw);
  
 diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
 index a53d235..7a270a8 100644
 --- a/hw/mem/nvdimm/pc-nvdimm.c
 +++ b/hw/mem/nvdimm/pc-nvdimm.c
 @@ -24,6 +24,19 @@
  
  #include hw/mem/pc-nvdimm.h
  
 +#define PAGE_SIZE  (1UL  12)
 +
 +static struct nvdimms_info {
 +ram_addr_t current_addr;
 +} nvdimms_info;
 +
 +/* the address range [offset, ~0ULL) is reserved for NVDIMM. */
 +void pc_nvdimm_reserve_range(ram_addr_t offset)
 +{
 +offset = ROUND_UP(offset, PAGE_SIZE);
 +nvdimms_info.current_addr = offset;
 +}
 +
  static char *get_file(Object *obj, Error **errp)
  {
  PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
 diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
 index 51152b8..8601e9b 100644
 --- a/include/hw/mem/pc-nvdimm.h
 +++ b/include/hw/mem/pc-nvdimm.h
 @@ -28,4 +28,5 @@ typedef struct PCNVDIMMDevice {
  #define PC_NVDIMM(obj) \
  OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
  
 +void pc_nvdimm_reserve_range(ram_addr_t offset);
  #endif
 -- 
 2.4.3
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: KVM : Virtio ring size

2015-08-10 Thread Stefan Hajnoczi
On Fri, Aug 07, 2015 at 10:48:50AM +0530, sai kiran wrote:
 I am experimenting on Virtio-net frontend driver. And I observe that
 the virtio ring size is communicated to guest as 256.
 I tried changing backend-qemu code manually, to propagate 512 ring size.
 
 But other than changing code and hardcoding, Is there anyway to
 configure the virtio ring size.

The ring size is hardcoded in the host.

If ring size is a problem, first check that the indirect vring feature
is enabled.  It allows each packet to take just 1 (indirect) descriptor
in the virtqueue.

Stefan


pgpMspK3_7oK7.pgp
Description: PGP signature


Re: Live migration using shared storage in different networks

2015-07-09 Thread Stefan Hajnoczi
On Mon, Jul 06, 2015 at 09:44:07AM +0100, Miguel Barbosa Gonçalves wrote:
 I am building a KVM cluster that needs VM live migration.
 
 My shared storage as well as the KVM hosts will be running
 CentOS.
 
 Because 10 Gbps Ethernet switches are very expensive at the
 moment I will connect the KVM hosts to the storage by
 cross-over cables and create private networks for each
 connection (10.0.0.0/30 and 10.0.0.4/30).
 
 The following diagram shows the topology
 
  Management ManagementManagement
 VLAN   VLAN  VLAN
  |  | |
 ++-+  10 Gbps  +++  10 Gbps  ++-+
 | KVM Host |---| Storage |---| KVM Host |
 ++-+   +++   ++-+
  |  | |
Public PublicPublic
 VLAN   VLAN  VLAN
 
 My question is: will live migration work in this configuration
 since the storage will have 2 different IP addresses
 (10.0.0.1 and 10.0.0.5) in 2 different networks even though
 it is the same storage?

At the QEMU level this works.

At the libvirt level you may need to be careful how you configure the
storage and domains (VMs).  You need to make sure that the storage
network details are not part of the same VM configuration that gets
applied on both hosts.

For example: if you NFS mount or attach iSCSI LUNs to the host and then
just give libvirt the path to the image file, it will work.  But if you
want libvirt to do the NFS/iSCSI setup for you then you'll have to dig
into the configuration documentation (http://libvirt.org/) and it may
not be possible.

Stefan


pgpDCdJm4w0jZ.pgp
Description: PGP signature


Re: [PATCH 14/16] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function

2015-07-02 Thread Stefan Hajnoczi
On Wed, Jul 01, 2015 at 10:50:30PM +0800, Xiao Guangrong wrote:
 +static uint32_t dsm_cmd_config_size(struct dsm_buffer *in, struct dsm_out 
 *out)
 +{
 +GSList *list = get_nvdimm_built_list();
 +PCNVDIMMDevice *nvdimm = get_nvdimm_device_by_handle(list, in-handle);
 +uint32_t status = NFIT_STATUS_NON_EXISTING_MEM_DEV;
 +
 +if (!nvdimm) {
 +goto exit;
 +}
 +
 +status = NFIT_STATUS_SUCCESS;
 +out-cmd_config_size.config_size = nvdimm-config_data_size;
 +out-cmd_config_size.max_xfer = max_xfer_config_size();

cpu_to_*() missing?

It should be possible to emulate NVDIMMs for a x86_64 guest on a
big-endian host, for example.


pgpLcgFKme_vc.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH 00/16] implement vNVDIMM

2015-07-02 Thread Stefan Hajnoczi
On Thu, Jul 02, 2015 at 02:34:05PM +0800, Xiao Guangrong wrote:
 On 07/02/2015 02:17 PM, Michael S. Tsirkin wrote:
 On Wed, Jul 01, 2015 at 10:50:16PM +0800, Xiao Guangrong wrote:
   hw/acpi/aml-build.c |   32 +-
   hw/i386/acpi-build.c|9 +-
   hw/i386/acpi-dsdt.dsl   |2 +-
   hw/i386/pc.c|   11 +-
   hw/mem/Makefile.objs|1 +
   hw/mem/pc-nvdimm.c  | 1040 
  +++
   include/hw/acpi/aml-build.h |5 +-
   include/hw/mem/pc-nvdimm.h  |   56 +++
   8 files changed, 1149 insertions(+), 7 deletions(-)
   create mode 100644 hw/mem/pc-nvdimm.c
   create mode 100644 include/hw/mem/pc-nvdimm.h
 
 Given the amount of code, this is definitely not 2.4 material.
 Maybe others will have the time to review it before this, but
 in any case please remember to repost after 2.4 is out.
 
 I see, thanks for your reminder, Michael!

I will review the series now.

Here is the QEMU release schedule:
http://qemu-project.org/Planning/2.4

Hard freeze - 7 July

QEMU 2.4 release - 4 August

It could be merged into a maintainer's tree when the -next branches are
opened (it's up to each maintainer but for the block and net trees I do
that at hard freeze time).


pgpGg9qlhEWNe.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH 00/16] implement vNVDIMM

2015-07-02 Thread Stefan Hajnoczi
On Wed, Jul 01, 2015 at 10:50:16PM +0800, Xiao Guangrong wrote:
 == Background ==
 NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
 on Intel's platform. They are discovered via ACPI and configured by _DSM
 method of NVDIMM device in ACPI. There has some supporting documents which
 can be found at:
 ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
 NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
 DSM Interface Example: 
 http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
 Driver Writer's Guide: 
 http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
 
 Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
 this patchset tries to enable it in virtualization field

From a device model perspective, have you checked whether it makes sense
to integrate nvdimms into the pc-dimm and hostmem code that is used for
memory hotplug and NUMA?

The NVDIMM device in your patches is a completely new TYPE_DEVICE so it
doesn't share any interfaces or code with existing memory devices.
Maybe that is the right solution here because NVDIMMs have different
characteristics, but I'm not sure.


pgpbdYnHE2wZa.pgp
Description: PGP signature


Re: Announcing qboot, a minimal x86 firmware for QEMU

2015-06-05 Thread Stefan Hajnoczi
On Tue, May 26, 2015 at 9:47 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, May 22, 2015 at 10:53:54AM +0800, Yong Wang wrote:
 On Thu, May 21, 2015 at 03:51:43PM +0200, Paolo Bonzini wrote:
  On the QEMU side, there is no support yet for persistent memory and the
  NFIT tables from ACPI 6.0.  Once that (and ACPI support) is added, qboot
  will automatically start using it.
 

 We are working on adding NFIT support into virtual bios.

 Great.  I asked about this on the #pmem (irc.oftc.net) IRC channel last week.

 Which virtual bios are you targeting?

Ping?

Interest in persistent memory is picking up and I'd like to avoid
duplicating work.  Which pieces do you have patches for?

1. QEMU -device pmem,file=/path/to/dax/file,id=pmem1 and fw_cfg/ACPI
info that gets passed to the guest
2. SeaBIOS NFIT ACPI table
3. ACPI NVDIMM DSM (probably not much needed, most features would be disabled)

Thanks,
Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 6/6] VSOCK: Add Makefile and Kconfig

2015-05-27 Thread Stefan Hajnoczi
From: Asias He as...@redhat.com

Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He as...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 drivers/vhost/Kconfig   |  4 
 drivers/vhost/Kconfig.vsock |  7 +++
 drivers/vhost/Makefile  |  4 
 net/vmw_vsock/Kconfig   | 18 ++
 net/vmw_vsock/Makefile  |  2 ++
 5 files changed, 35 insertions(+)
 create mode 100644 drivers/vhost/Kconfig.vsock

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 017a1e8..169fb19 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -32,3 +32,7 @@ config VHOST
---help---
  This option is selected by any driver which needs to access
  the core of vhost.
+
+if STAGING
+source drivers/vhost/Kconfig.vsock
+endif
diff --git a/drivers/vhost/Kconfig.vsock b/drivers/vhost/Kconfig.vsock
new file mode 100644
index 000..3491865
--- /dev/null
+++ b/drivers/vhost/Kconfig.vsock
@@ -0,0 +1,7 @@
+config VHOST_VSOCK
+   tristate vhost virtio-vsock driver
+   depends on VSOCKETS  EVENTFD
+   select VIRTIO_VSOCKETS_COMMON
+   default n
+   ---help---
+   Say M here to enable the vhost-vsock for virtio-vsock guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index e0441c3..6b012b9 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,5 +4,9 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
+obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
+vhost_vsock-y := vsock.o
+
 obj-$(CONFIG_VHOST_RING) += vringh.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 14810ab..74e0bc8 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -26,3 +26,21 @@ config VMWARE_VMCI_VSOCKETS
 
  To compile this driver as a module, choose M here: the module
  will be called vmw_vsock_vmci_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS
+   tristate virtio transport for Virtual Sockets
+   depends on VSOCKETS  VIRTIO
+   select VIRTIO_VSOCKETS_COMMON
+   help
+ This module implements a virtio transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine runs on Qemu/KVM.
+
+ To compile this driver as a module, choose M here: the module
+ will be called virtio_vsock_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS_COMMON
+   tristate
+   ---help---
+ This option is selected by any driver which needs to access
+ the virtio_vsock.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 2ce52d7..cf4c294 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -1,5 +1,7 @@
 obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS) += virtio_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += virtio_transport_common.o
 
 vsock-y += af_vsock.o vsock_addr.o
 
-- 
2.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 5/6] VSOCK: Introduce vhost-vsock.ko

2015-05-27 Thread Stefan Hajnoczi
From: Asias He as...@redhat.com

VM sockets vhost transport implementation. This module runs in host
kernel.

Signed-off-by: Asias He as...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 drivers/vhost/vsock.c | 596 ++
 drivers/vhost/vsock.h |   4 +
 2 files changed, 600 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
new file mode 100644
index 000..a9514aa
--- /dev/null
+++ b/drivers/vhost/vsock.c
@@ -0,0 +1,596 @@
+/*
+ * vhost transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He as...@redhat.com
+ * Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include linux/miscdevice.h
+#include linux/module.h
+#include linux/mutex.h
+#include net/sock.h
+#include linux/virtio_vsock.h
+#include linux/vhost.h
+
+#include net/af_vsock.h
+#include vhost.h
+#include vsock.h
+
+#define VHOST_VSOCK_DEFAULT_HOST_CID   2
+
+static int vhost_transport_socket_init(struct vsock_sock *vsk,
+  struct vsock_sock *psk);
+
+enum {
+   VHOST_VSOCK_FEATURES = VHOST_FEATURES,
+};
+
+/* Used to track all the vhost_vsock instances on the system. */
+static LIST_HEAD(vhost_vsock_list);
+static DEFINE_MUTEX(vhost_vsock_mutex);
+
+struct vhost_vsock_virtqueue {
+   struct vhost_virtqueue vq;
+};
+
+struct vhost_vsock {
+   /* Vhost device */
+   struct vhost_dev dev;
+   /* Vhost vsock virtqueue*/
+   struct vhost_vsock_virtqueue vqs[VSOCK_VQ_MAX];
+   /* Link to global vhost_vsock_list*/
+   struct list_head list;
+   /* Head for pkt from host to guest */
+   struct list_head send_pkt_list;
+   /* Work item to send pkt */
+   struct vhost_work send_pkt_work;
+   /* Wait queue for send pkt */
+   wait_queue_head_t queue_wait;
+   /* Used for global tx buf limitation */
+   u32 total_tx_buf;
+   /* Guest contex id this vhost_vsock instance handles */
+   u32 guest_cid;
+};
+
+static u32 vhost_transport_get_local_cid(void)
+{
+   u32 cid = VHOST_VSOCK_DEFAULT_HOST_CID;
+   return cid;
+}
+
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+   struct vhost_vsock *vsock;
+
+   mutex_lock(vhost_vsock_mutex);
+   list_for_each_entry(vsock, vhost_vsock_list, list) {
+   if (vsock-guest_cid == guest_cid) {
+   mutex_unlock(vhost_vsock_mutex);
+   return vsock;
+   }
+   }
+   mutex_unlock(vhost_vsock_mutex);
+
+   return NULL;
+}
+
+static void
+vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
+   struct vhost_virtqueue *vq)
+{
+   struct virtio_vsock_pkt *pkt;
+   bool added = false;
+   unsigned out, in;
+   struct sock *sk;
+   int head, ret;
+
+   mutex_lock(vq-mutex);
+   vhost_disable_notify(vsock-dev, vq);
+   for (;;) {
+   if (list_empty(vsock-send_pkt_list)) {
+   vhost_enable_notify(vsock-dev, vq);
+   break;
+   }
+
+   head = vhost_get_vq_desc(vq, vq-iov, ARRAY_SIZE(vq-iov),
+out, in, NULL, NULL);
+   pr_debug(%s: head = %d\n, __func__, head);
+   if (head  0)
+   break;
+
+   if (head == vq-num) {
+   if (unlikely(vhost_enable_notify(vsock-dev, vq))) {
+   vhost_disable_notify(vsock-dev, vq);
+   continue;
+   }
+   break;
+   }
+
+   /* TODO check out == 0 and in = 1 */
+
+   pkt = list_first_entry(vsock-send_pkt_list,
+  struct virtio_vsock_pkt, list);
+   list_del_init(pkt-list);
+
+   /* FIXME: no assumption of frame layout */
+   ret = __copy_to_user(vq-iov[0].iov_base, pkt-hdr,
+sizeof(pkt-hdr));
+   if (ret) {
+   virtio_transport_free_pkt(pkt);
+   vq_err(vq, Faulted on copying pkt hdr\n);
+   break;
+   }
+   if (pkt-buf  pkt-len  0) {
+   /* TODO avoid iov[1].iov_base buffer overflow, check 
pkt-len! */
+   ret = __copy_to_user(vq-iov[1].iov_base, pkt-buf,
+   pkt-len);
+   if (ret) {
+   virtio_transport_free_pkt(pkt);
+   vq_err(vq, Faulted on copying pkt buf\n);
+   break;
+   }
+   }
+
+   vhost_add_used(vq, head, pkt

[RFC 2/6] Add dgram_skb to vsock_sock

2015-05-27 Thread Stefan Hajnoczi
From: Asias He as...@redhat.com

This list will be used to match received packets when multiple packets
are used because datagram size is larger than the receive buffer size.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/net/af_vsock.h   | 1 +
 net/vmw_vsock/af_vsock.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d52b984..bc9055c 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -58,6 +58,7 @@ struct vsock_sock {
 */
struct list_head pending_links;
struct list_head accept_queue;
+   struct list_head dgram_skb;
bool rejected;
struct delayed_work dwork;
u32 peer_shutdown;
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index ae3ce3d..0b3c498 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -684,6 +684,7 @@ struct sock *__vsock_create(struct net *net,
vsk-listener = NULL;
INIT_LIST_HEAD(vsk-pending_links);
INIT_LIST_HEAD(vsk-accept_queue);
+   INIT_LIST_HEAD(vsk-dgram_skb); /* TODO free list entries on shutdown 
and limit list size or timeout somehow? */
vsk-rejected = false;
vsk-sent_request = false;
vsk-ignore_connecting_rst = false;
-- 
2.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 0/6] Add virtio transport for AF_VSOCK

2015-05-27 Thread Stefan Hajnoczi
This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors.  It is currently only implemented for VMware's VMCI transport.

This series implements the proposed virtio-vsock device specification from
here:
http://comments.gmane.org/gmane.comp.emulators.virtio.devel/855

Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
picked up the series again.

The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock

Why virtio-vsock?
-
Guest-host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.

virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM and SOCK_DGRAM semantics.  Applications on the host can easily
connect to guest agents because the sockets API allows multiple connections to
a listen socket (unlike virtio-serial).  This simplifies the guest-host
communication and eliminates the need for extra processes on the host to
arbitrate virtio-serial ports.

Overview

This series adds 3 pieces:

1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko

2. virtio_transport.ko - guest driver

3. drivers/vhost/vsock.ko - host driver

Howto
-
The following kernel options are needed:
  CONFIG_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS_COMMON=y
  CONFIG_VHOST_VSOCK=m

Launch QEMU as follows:
  # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0

Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
is 2 and the guest is automatically assigned a CID (use VMADDR_CID_ANY (-1) to
bind to it).

Status
--
I am auditing and testing the code, while iterating the virtio device
specification.  There is scope to change both the implementation (these
patches) and the virtio device specification.

TODO:
 * Flexible virtqueue descriptor layout
 * Avoid Linux-specific constants in packet headers (SOCK_STREAM/SOCK_DGRAM)
 * Send RST if there is no listening SOCK_STREAM socket
 * Add missing input validation for packet headers and vhost ioctls

Asias He (6):
  VSOCK: Introduce vsock_find_unbound_socket and
vsock_bind_dgram_generic
  Add dgram_skb to vsock_sock
  VSOCK: Introduce virtio-vsock-common.ko
  VSOCK: Introduce virtio-vsock.ko
  VSOCK: Introduce vhost-vsock.ko
  VSOCK: Add Makefile and Kconfig

 drivers/vhost/Kconfig   |4 +
 drivers/vhost/Kconfig.vsock |7 +
 drivers/vhost/Makefile  |4 +
 drivers/vhost/vsock.c   |  596 +++
 drivers/vhost/vsock.h   |4 +
 include/linux/virtio_vsock.h|  207 +
 include/net/af_vsock.h  |3 +
 include/uapi/linux/virtio_ids.h |1 +
 include/uapi/linux/virtio_vsock.h   |   80 ++
 net/vmw_vsock/Kconfig   |   18 +
 net/vmw_vsock/Makefile  |2 +
 net/vmw_vsock/af_vsock.c|   71 ++
 net/vmw_vsock/virtio_transport.c|  450 +++
 net/vmw_vsock/virtio_transport_common.c | 1248 +++
 14 files changed, 2695 insertions(+)
 create mode 100644 drivers/vhost/Kconfig.vsock
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport.c
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

-- 
2.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 4/6] VSOCK: Introduce virtio-vsock.ko

2015-05-27 Thread Stefan Hajnoczi
From: Asias He as...@redhat.com

VM sockets virtio transport implementation. This module runs in guest
kernel.

Signed-off-by: Asias He as...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 net/vmw_vsock/virtio_transport.c | 450 +++
 1 file changed, 450 insertions(+)
 create mode 100644 net/vmw_vsock/virtio_transport.c

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
new file mode 100644
index 000..ebe1eef
--- /dev/null
+++ b/net/vmw_vsock/virtio_transport.c
@@ -0,0 +1,450 @@
+/*
+ * virtio transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He as...@redhat.com
+ * Stefan Hajnoczi stefa...@redhat.com
+ *
+ * Some of the code is take from Gerd Hoffmann kra...@redhat.com's
+ * early virtio-vsock proof-of-concept bits.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include linux/spinlock.h
+#include linux/module.h
+#include linux/list.h
+#include linux/virtio.h
+#include linux/virtio_ids.h
+#include linux/virtio_config.h
+#include linux/virtio_vsock.h
+#include net/sock.h
+#include linux/mutex.h
+#include net/af_vsock.h
+
+static struct workqueue_struct *virtio_vsock_workqueue;
+static struct virtio_vsock *the_virtio_vsock;
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock);
+
+struct virtio_vsock {
+   /* Virtio device */
+   struct virtio_device *vdev;
+   /* Virtio virtqueue */
+   struct virtqueue *vqs[VSOCK_VQ_MAX];
+   /* Wait queue for send pkt */
+   wait_queue_head_t queue_wait;
+   /* Work item to send pkt */
+   struct work_struct tx_work;
+   /* Work item to recv pkt */
+   struct work_struct rx_work;
+   /* Mutex to protect send pkt*/
+   struct mutex tx_lock;
+   /* Mutex to protect recv pkt*/
+   struct mutex rx_lock;
+   /* Number of recv buffers */
+   int rx_buf_nr;
+   /* Number of max recv buffers */
+   int rx_buf_max_nr;
+   /* Used for global tx buf limitation */
+   u32 total_tx_buf;
+   /* Guest context id, just like guest ip address */
+   u32 guest_cid;
+};
+
+static struct virtio_vsock *virtio_vsock_get(void)
+{
+   return the_virtio_vsock;
+}
+
+static u32 virtio_transport_get_local_cid(void)
+{
+   struct virtio_vsock *vsock = virtio_vsock_get();
+
+   return vsock-guest_cid;
+}
+
+static int
+virtio_transport_send_pkt(struct vsock_sock *vsk,
+ struct virtio_vsock_pkt_info *info)
+{
+   u32 src_cid, src_port, dst_cid, dst_port;
+   int ret, in_sg = 0, out_sg = 0;
+   struct virtio_transport *trans;
+   struct virtio_vsock_pkt *pkt;
+   struct virtio_vsock *vsock;
+   struct scatterlist hdr, buf, *sgs[2];
+   struct virtqueue *vq;
+   u32 pkt_len = info-pkt_len;
+   DEFINE_WAIT(wait);
+
+   vsock = virtio_vsock_get();
+   if (!vsock)
+   return -ENODEV;
+
+   src_cid = virtio_transport_get_local_cid();
+   src_port = vsk-local_addr.svm_port;
+   if (!info-remote_cid) {
+   dst_cid = vsk-remote_addr.svm_cid;
+   dst_port = vsk-remote_addr.svm_port;
+   } else {
+   dst_cid = info-remote_cid;
+   dst_port = info-remote_port;
+   }
+
+   trans = vsk-trans;
+   vq = vsock-vqs[VSOCK_VQ_TX];
+
+   if (pkt_len  VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
+   pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+   pkt_len = virtio_transport_get_credit(trans, pkt_len);
+   /* Do not send zero length OP_RW pkt*/
+   if (pkt_len == 0  info-op == VIRTIO_VSOCK_OP_RW)
+   return pkt_len;
+
+   /* Respect global tx buf limitation */
+   mutex_lock(vsock-tx_lock);
+   while (pkt_len + vsock-total_tx_buf  VIRTIO_VSOCK_MAX_TX_BUF_SIZE) {
+   prepare_to_wait_exclusive(vsock-queue_wait, wait,
+ TASK_UNINTERRUPTIBLE);
+   mutex_unlock(vsock-tx_lock);
+   schedule();
+   mutex_lock(vsock-tx_lock);
+   finish_wait(vsock-queue_wait, wait);
+   }
+   vsock-total_tx_buf += pkt_len;
+   mutex_unlock(vsock-tx_lock);
+
+   pkt = virtio_transport_alloc_pkt(vsk, info, pkt_len,
+src_cid, src_port,
+dst_cid, dst_port);
+   if (!pkt) {
+   /* TODO what about decrementing total_tx_buf */
+   virtio_transport_put_credit(trans, pkt_len);
+   return -ENOMEM;
+   }
+
+   pr_debug(%s:info-pkt_len= %d\n, __func__, info-pkt_len);
+
+   /* Will be released in virtio_transport_send_pkt_work */
+   sock_hold(trans-vsk-sk);
+   virtio_transport_inc_tx_pkt(pkt);
+
+   /* Put pkt in the virtqueue */
+   sg_init_one(hdr, pkt-hdr, sizeof(pkt-hdr));
+   sgs[out_sg++] = hdr;
+   if (info-msg  info-pkt_len

[RFC 1/6] VSOCK: Introduce vsock_find_unbound_socket and vsock_bind_dgram_generic

2015-05-27 Thread Stefan Hajnoczi
From: Asias He as...@redhat.com

Signed-off-by: Asias He as...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/net/af_vsock.h   |  2 ++
 net/vmw_vsock/af_vsock.c | 70 
 2 files changed, 72 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 172632d..d52b984 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -172,8 +172,10 @@ void vsock_insert_connected(struct vsock_sock *vsk);
 void vsock_remove_bound(struct vsock_sock *vsk);
 void vsock_remove_connected(struct vsock_sock *vsk);
 struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
+struct sock *vsock_find_unbound_socket(struct sockaddr_vm *addr);
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
+int vsock_bind_dgram_generic(struct vsock_sock *vsk, struct sockaddr_vm *addr);
 
 #endif /* __AF_VSOCK_H__ */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 2ec86e6..ae3ce3d 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -224,6 +224,17 @@ static struct sock *__vsock_find_bound_socket(struct 
sockaddr_vm *addr)
return NULL;
 }
 
+static struct sock *__vsock_find_unbound_socket(struct sockaddr_vm *addr)
+{
+   struct vsock_sock *vsk;
+
+   list_for_each_entry(vsk, vsock_unbound_sockets, bound_table)
+   if (addr-svm_port == vsk-local_addr.svm_port)
+   return sk_vsock(vsk);
+
+   return NULL;
+}
+
 static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
  struct sockaddr_vm *dst)
 {
@@ -299,6 +310,21 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm 
*addr)
 }
 EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
 
+struct sock *vsock_find_unbound_socket(struct sockaddr_vm *addr)
+{
+   struct sock *sk;
+
+   spin_lock_bh(vsock_table_lock);
+   sk = __vsock_find_unbound_socket(addr);
+   if (sk)
+   sock_hold(sk);
+
+   spin_unlock_bh(vsock_table_lock);
+
+   return sk;
+}
+EXPORT_SYMBOL_GPL(vsock_find_unbound_socket);
+
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst)
 {
@@ -533,6 +559,50 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
return 0;
 }
 
+int vsock_bind_dgram_generic(struct vsock_sock *vsk, struct sockaddr_vm *addr)
+{
+   static u32 port = LAST_RESERVED_PORT + 1;
+   struct sockaddr_vm new_addr;
+
+   vsock_addr_init(new_addr, addr-svm_cid, addr-svm_port);
+
+   if (addr-svm_port == VMADDR_PORT_ANY) {
+   bool found = false;
+   unsigned int i;
+
+   for (i = 0; i  MAX_PORT_RETRIES; i++) {
+   if (port = LAST_RESERVED_PORT)
+   port = LAST_RESERVED_PORT + 1;
+
+   new_addr.svm_port = port++;
+
+   if (!__vsock_find_unbound_socket(new_addr)) {
+   found = true;
+   break;
+   }
+   }
+
+   if (!found)
+   return -EADDRNOTAVAIL;
+   } else {
+   /* If port is in reserved range, ensure caller
+* has necessary privileges.
+*/
+   if (addr-svm_port = LAST_RESERVED_PORT 
+   !capable(CAP_NET_BIND_SERVICE)) {
+   return -EACCES;
+   }
+
+   if (__vsock_find_unbound_socket(new_addr))
+   return -EADDRINUSE;
+   }
+
+   vsock_addr_init(vsk-local_addr, new_addr.svm_cid, new_addr.svm_port);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vsock_bind_dgram_generic);
+
 static int __vsock_bind_dgram(struct vsock_sock *vsk,
  struct sockaddr_vm *addr)
 {
-- 
2.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 3/6] VSOCK: Introduce virtio-vsock-common.ko

2015-05-27 Thread Stefan Hajnoczi
From: Asias He as...@redhat.com

This module contains the common code and header files for the following
virtio-vsock and virtio-vhost kernel modules.

Signed-off-by: Asias He as...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/linux/virtio_vsock.h|  207 +
 include/uapi/linux/virtio_ids.h |1 +
 include/uapi/linux/virtio_vsock.h   |   80 ++
 net/vmw_vsock/virtio_transport_common.c | 1248 +++
 4 files changed, 1536 insertions(+)
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
new file mode 100644
index 000..01d84a5
--- /dev/null
+++ b/include/linux/virtio_vsock.h
@@ -0,0 +1,207 @@
+/*
+ * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
+ * anyone can use the definitions to implement compatible drivers/servers:
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Copyright (C) Red Hat, Inc., 2013-2015
+ * Copyright (C) Asias He as...@redhat.com, 2013
+ * Copyright (C) Stefan Hajnoczi stefa...@redhat.com, 2015
+ */
+
+#ifndef _LINUX_VIRTIO_VSOCK_H
+#define _LINUX_VIRTIO_VSOCK_H
+
+#include uapi/linux/virtio_vsock.h
+#include linux/socket.h
+#include net/sock.h
+
+#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE  128
+#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE   (1024 * 4)
+#define VIRTIO_VSOCK_MAX_BUF_SIZE  0xUL
+#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE  (1024 * 64)
+#define VIRTIO_VSOCK_MAX_TX_BUF_SIZE   (1024 * 1024 * 16)
+#define VIRTIO_VSOCK_MAX_DGRAM_SIZE(1024 * 64)
+
+struct vsock_transport_recv_notify_data;
+struct vsock_transport_send_notify_data;
+struct sockaddr_vm;
+struct vsock_sock;
+
+enum {
+   VSOCK_VQ_CTRL   = 0,
+   VSOCK_VQ_RX = 1, /* for host to guest data */
+   VSOCK_VQ_TX = 2, /* for guest to host data */
+   VSOCK_VQ_MAX= 3,
+};
+
+/* virtio transport socket state */
+struct virtio_transport {
+   struct virtio_transport_pkt_ops *ops;
+   struct vsock_sock *vsk;
+
+   u32 buf_size;
+   u32 buf_size_min;
+   u32 buf_size_max;
+
+   struct mutex tx_lock;
+   struct mutex rx_lock;
+
+   struct list_head rx_queue;
+   u32 rx_bytes;
+
+   /* Protected by trans-tx_lock */
+   u32 tx_cnt;
+   u32 buf_alloc;
+   u32 peer_fwd_cnt;
+   u32 peer_buf_alloc;
+   /* Protected by trans-rx_lock */
+   u32 fwd_cnt;
+
+   u16 dgram_id;
+};
+
+struct virtio_vsock_pkt {
+   struct virtio_vsock_hdr hdr;
+   struct virtio_transport *trans;
+   struct work_struct work;
+   struct list_head list;
+   void *buf;
+   u32 len;
+   u32 off;
+};
+
+struct virtio_vsock_pkt_info {
+   u32 remote_cid, remote_port;
+   struct msghdr *msg;
+   u32 pkt_len;
+   u16 type;
+   u16 op;
+   u32 flags;
+   u16 dgram_id;
+   u16 dgram_len;
+};
+
+struct virtio_transport_pkt_ops {
+   int (*send_pkt)(struct vsock_sock *vsk,
+   struct virtio_vsock_pkt_info *info);
+};
+
+void virtio_vsock_dumppkt(const char *func,
+ const struct virtio_vsock_pkt *pkt);
+
+struct sock *
+virtio_transport_get_pending(struct sock *listener

Re: Announcing qboot, a minimal x86 firmware for QEMU

2015-05-26 Thread Stefan Hajnoczi
On Fri, May 22, 2015 at 10:53:54AM +0800, Yong Wang wrote:
 On Thu, May 21, 2015 at 03:51:43PM +0200, Paolo Bonzini wrote:
  On the QEMU side, there is no support yet for persistent memory and the
  NFIT tables from ACPI 6.0.  Once that (and ACPI support) is added, qboot
  will automatically start using it.
  
 
 We are working on adding NFIT support into virtual bios.

Great.  I asked about this on the #pmem (irc.oftc.net) IRC channel last week.

Which virtual bios are you targeting?

Stefan


pgp1LQizfjfkd.pgp
Description: PGP signature


Re: [GSoC] project proposal

2015-04-23 Thread Stefan Hajnoczi
On Wed, Apr 22, 2015 at 9:51 AM, Catalin Vasile
catalinvasil...@gmail.com wrote:
 On Wed, Apr 22, 2015 at 11:20 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Apr 21, 2015 at 04:07:56PM +0200, Paolo Bonzini wrote:
 On 21/04/2015 16:07, Catalin Vasile wrote:
  I don't get the part with getting cryptodev upstream.
  I don't know what getting cryptodev upstream actually implies.
  From what I know cryptodev is done (is a functional project) that was
  rejected in the Linux Kernel
  and there isn't actually way to get it upstream.

 Yes, I agree.

 The limitations of AF_ALG need to addressed somehow, so what is the next
 step?

 Stefan

 If we want a mainstream userspace backend that could interact with a
 lot of crypto engines, we could use OpenSSL (it can actually use
 cryptodev and AF_ALG as engines).
 For now, until mid June (my diploma project presentation) I still want
 to use vhost as a backend for the sole purpose of having a finished
 backend which now I have a good grasp upon.

I understand.

Once you have a first approximation of the new virtio crypto device
interface, I suggest continuing the discussion with the VIRTIO working
group:
https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio#feedback

If you send a virtio spec proposal you can get feedback.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] project proposal

2015-04-22 Thread Stefan Hajnoczi
On Tue, Apr 21, 2015 at 04:07:56PM +0200, Paolo Bonzini wrote:
 On 21/04/2015 16:07, Catalin Vasile wrote:
  I don't get the part with getting cryptodev upstream.
  I don't know what getting cryptodev upstream actually implies.
  From what I know cryptodev is done (is a functional project) that was
  rejected in the Linux Kernel
  and there isn't actually way to get it upstream.
 
 Yes, I agree.

The limitations of AF_ALG need to addressed somehow, so what is the next
step?

Stefan


pgpDV1dGiX8CC.pgp
Description: PGP signature


Re: [GSoC] project proposal

2015-04-22 Thread Stefan Hajnoczi
On Tue, Apr 21, 2015 at 05:24:55PM +0300, Catalin Vasile wrote:
 Can you give me more details on GnuTLS?
 I'm going through some documentation and code and I see that it
 doesn't actually have separate encryption and authentication
 primitives.

gnutls is a natural choice because QEMU already uses it for TLS, but if
it doesn't support the primitives you need, then AF_ALG could be used
directly.

http://www.gnutls.org/manual/gnutls.html#Using-GnuTLS-as-a-cryptographic-library

Stefan


pgpucapBiwS6o.pgp
Description: PGP signature


x2apic issues with Solaris and Xen guests

2015-04-20 Thread Stefan Hajnoczi
I wonder whether the following two x2apic issues are related:

Solaris 10 U11 network doesn't work
https://bugzilla.redhat.com/show_bug.cgi?id=1040500

kvm - fails to setup timer interrupt via io-apic
(Thanks to Michael Tokarev for posting this link)
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528077#68

It seems KVM's x2apic emulation works with regular Linux and Windows
guests, but not necessarily with other OSes.

Has anyone looked into this?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] project proposal

2015-03-31 Thread Stefan Hajnoczi
On Wed, Mar 18, 2015 at 8:59 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 On 18/03/2015 18:05, Catalin Vasile wrote:
 cryptodev is not merged into upstream from what I know.

 Yes, but QEMU runs on non-Linux platforms too.  Of course doing
 vhost+driver or gnutls+driver would be already more than enough for the
 summer.

My suggestion is to work on the gnutls driver.  Then, if you have time
left, get cryptodev upstream (it can be part of your GSoC project
plan).

That approach is more beneficial in the long run.  It will allow other
applications to use the Crypto API too.

vhost is good for exploiting kernel-only functionality (usually due to
security/reliability boundaries).  In this case the only reason for
vhost is that the userspace API isn't ready yet.  Use the opportunity
to contribute to that effort instead of working around it.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iscsi multipath failure with libvirtError: Failed to open file '/dev/mapper/Mar': No such file or directory

2015-03-27 Thread Stefan Hajnoczi
On Mon, Mar 23, 2015 at 10:14:31PM +0530, mad Engineer wrote:
 hello All,
   I know the issue is related to libvirt,but i dont know
 where to ask.

The libvirt mailing list is the place to ask libvirt questions.  I have
CCed it.

 i have centos 6.6 running KVM as compute node in openstack icehouse
 
 when i try to attach volume to instance it shows
 
 2596: error : virStorageFileGetMetadataRecurse:952 : Failed to open
 file '/dev/mapper/Mar': No such file or directory
 
 in libvirt log
 
 This does not always happen when it happens no one will be able to
 attach volume to instance
 
 
 using EMC VNX as storage backend.
 
 
 multipath.conf
 
 
 # Skip the files uner /dev that are definitely not FC/iSCSI devices
 # Different system may need different customization
 devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
 devnode ^hd[a-z][0-9]*
 devnode ^cciss!c[0-9]d[0-9]*[p[0-9]*]
 
 # Skip LUNZ device from VNX
 device {
 vendor DGC
 product LUNZ
 }
 }
 
 defaults {
 user_friendly_names no
 flush_on_last_del yes
 }
 
 devices {
 # Device attributed for EMC CLARiiON and VNX series ALUA
 device {
 vendor DGC
 product .*
 product_blacklist LUNZ
 path_grouping_policy group_by_prio
 path_selector round-robin 0
 path_checker emc_clariion
 features 1 queue_if_no_path
 hardware_handler 1 alua
 prio alua
 failback immediate
 }
 }
 
 
 Can any one help me with this issue

You may need to check dmesg or logs related to the EMC storage.  In
particular, check for LUNs going offline, coming online, or the
multipath device changing state.

Stefan


pgpM6PeZXrDpl.pgp
Description: PGP signature


Re: KVM live migration i/o error

2015-03-23 Thread Stefan Hajnoczi
On Fri, Mar 20, 2015 at 12:34:59PM +0100, Francesc Guasch wrote:
 On Fri, Mar 20, 2015 at 10:03:20AM +, Stefan Hajnoczi wrote:
 
 Hi Stefan, thank you very much for answering me.
 
  On Wed, Mar 18, 2015 at 04:53:28PM +0100, Francesc Guasch wrote:
   I have three Ubuntu Server 14.04 trusty with KVM. Two of
   them are HP servers and one is Dell. Both brands run fine
   the KVM virtual servers, and I can do live migration between
   the HPs. But I get I/O errors in the vda when I migrate to
   or from the Dell server.
   
   I have shared storage with NFS, mounted the same way in all
   of them:
   
   As soon as it starts in the origin console I spot I/O error
   messages, when it finishes I got them in the console in the
   destination server. The file system is read only and I have to
   shut it down hard.
   
   end request I/O error, /dev/vda, sector 8790327
  
  origin console == guest's console?
 
 Yes, I mean I open two consoles with virt-manager, one in
 the origin host and another one in the destination
  
  I/O errors starting while the guest is still running on the migration
  source host is strange.  I wonder if something happened to the NFS file
  related to file permissions or SELinux labels?
 
 I think I found something checking SELinux. ls -Z and getfattr
 return nothing. But ps -eZ showed something very different
 in the Dell server.
 
 This is in the HP server:
 /usr/sbin/libvirtd  1034 ?11:51:44 libvirtd
 libvirt-09540b5d-82 701  ?05:28:40 qemu-system-x86
 unconfined  1?00:01:00 init
 
 In the Dell server init is confined in lxc and there are also
 lxc-start processes.
 
 /usr/sbin/libvirtd  1622 ?05:07:07 libvirtd
 libvirt-8a0f9087-32d... 29926 ?   00:00:01 qemu-system-x86
 lxc-container-default   1774 ?00:00:00 init
 /usr/bin/lxc-start  1763 ?00:00:00 lxc-start
 
 There is also LXC installed in that server ! Maybe that is messing
 with kvm. The qemu processes look fine to me but there is a chance
 the problem comes from there.
 
 I could move the LXC somewhere else or I can keep it there to
 try to fix this issue. What do you advice I should do now ?

I suggest asking on the libvirt mailing list: libvirt-l...@redhat.com


pgpk8pbvBvTGx.pgp
Description: PGP signature


Re: Windows 7 guest installer does not detect drive if physical partition used instead of disk file.

2015-03-23 Thread Stefan Hajnoczi
On Sat, Mar 21, 2015 at 01:50:46AM +0800, Emmanuel Noobadmin wrote:
 Running
 3.18.9-200.fc21.x86_64
 qemu 2:2.1.3-3.fc21
 libvirt 1.2.9.2-1.fc21
 System is a Thinkpad X250 with Intel i7-5600u Broadwell GT2
 
 I'm trying to replace the Win7 installation on my laptop with Fedora
 21 and virtualizing Windows 7 for work purposes. I'd prefer to give
 the guest its own NTFS partition instead of using a file for both
 performance and ease of potential recovery.
 
 So I've set aside unpartitioned space on the hard disk and added
 /dev/sda to the virt-manager storage pool, created a new volume and
 assigned it to the guest as an IDE drive. Unfortunately, the Windows 7
 installer does not see this drive despite being IDE and not virtio.
 If I use a qcow2 file as the drive, the installer has no problems
 detecting it.
 
 To eliminate virt-manager from the equation, I've also tried to do a
 very basic install using virt-install with similar results, the
 physical partition cannot be detected regardless of bus type
 (IDE/SATA/virtio) even with the signed Redhat virtio drivers loaded by
 the installer.
 
 I was unable to find any similar issues or solutions online except a 2
 year old thread on linuxquestions which quoted that we must specify
 the whole disk instead of a partition. However, I cannot find the
 source of that quote.
 http://www.linuxquestions.org/questions/linux-virtualization-and-cloud-90/qemu-kvm-on-a-real-partition-947162/
 
 Is this really the case and the reason why Windows 7 cannot see the
 physical partition or there is something else I am doing wrong?

I have CCed the libvirt mailing list, since KVM is a component here but
your question seems to be mainly about libvirt, virt-manager,
virt-install, etc.

It sounds like you want an NTFS partition on /dev/sda.  That requires
passing the whole /dev/sda drive to the guest - and the Windows
installer might overwrite your GRUB Master Boot Record.  Be careful when
trying to do this.

Also keep in mind that the virtual machine's hardware and your physical
hardware are probably quiet different (different chipsets, PCI devices,
etc).  Windows might not be happy booting on the physical host if it was
installed under KVM, and vice versa.  This is known as
physical-to-virtual (p2v) migration and means some tweaks or driver
installs may be necessary to make Windows run after switching.

Stefan


pgprBcrq2t8NW.pgp
Description: PGP signature


Re: KVM live migration i/o error

2015-03-20 Thread Stefan Hajnoczi
On Wed, Mar 18, 2015 at 04:53:28PM +0100, Francesc Guasch wrote:
 I have three Ubuntu Server 14.04 trusty with KVM. Two of
 them are HP servers and one is Dell. Both brands run fine
 the KVM virtual servers, and I can do live migration between
 the HPs. But I get I/O errors in the vda when I migrate to
 or from the Dell server.
 
 I have shared storage with NFS, mounted the same way in all
 of them:
 
 nfs.sever:/kvm /var/lib/libvirt/images nfs auto,vers=3
 
 I checked the version of all the packages to make sure are
 the same. I got:
 
 kernel: 3.13.0-43-generic #72-Ubuntu SMP x86_64 libvirt:
 libvirt: 1.2.2-0ubuntu13.1.9 
 qemu-utils: 2.0.0+dfsg-2ubuntu1.10
 qemu-kvm: 2.0.0+dfsg-2ubuntu1.10
 
 I made sure the Cache in the Storage is set to None.
 
 Disk bus: virtio Cache mode: none IO mode: default
 
 I run this to do live migration:
 
 virsh migrate --live virtual qemu+ssh://dellserver/system
 
 As soon as it starts in the origin console I spot I/O error
 messages, when it finishes I got them in the console in the
 destination server. The file system is read only and I have to
 shut it down hard.
 
 end request I/O error, /dev/vda, sector 8790327

origin console == guest's console?

I/O errors starting while the guest is still running on the migration
source host is strange.  I wonder if something happened to the NFS file
related to file permissions or SELinux labels?

Stefan


pgpiV3ba25OeW.pgp
Description: PGP signature


  1   2   3   4   5   6   7   8   9   >