[PATCH] hw/misc: Add a virtual pci device to dynamically attach memory to QEMU

2021-09-25 Thread David Dai
Add a virtual pci to QEMU, the pci device is used to dynamically attach memory
to VM, so driver in guest can apply host memory in fly without virtualization
management software's help, such as libvirt/manager. The attached memory is
isolated from System RAM, it can be used in heterogeneous memory management for
virtualization. Multiple VMs dynamically share same computing device memory
without memory overcommit.

Signed-off-by: David Dai 
---
 docs/devel/dynamic_mdev.rst | 122 ++
 hw/misc/Kconfig |   5 +
 hw/misc/dynamic_mdev.c  | 456 
 hw/misc/meson.build |   1 +
 4 files changed, 584 insertions(+)
 create mode 100644 docs/devel/dynamic_mdev.rst
 create mode 100644 hw/misc/dynamic_mdev.c

diff --git a/docs/devel/dynamic_mdev.rst b/docs/devel/dynamic_mdev.rst
new file mode 100644
index 00..8e2edb6600
--- /dev/null
+++ b/docs/devel/dynamic_mdev.rst
@@ -0,0 +1,122 @@
+Motivation:
+In heterogeneous computing system, accelorator generally exposes its device
+memory to host via PCIe and CXL.mem(Compute Express Link) to share memory
+between host and device, and these memory generally are uniformly managed by
+host, they are called HDM (host managed device memory), further SVA (share
+virtual address) can be achieved on this base. One computing device may be used
+by multiple virtual machines if it supports SRIOV, to efficiently use device
+memory in virtualization, each VM allocates device memory on-demand without
+overcommit, but how to dynamically attach host memory resource to VM. A virtual
+PCI device, dynamic_mdev, is introduced to achieve this target. dynamic_mdev
+has a big bar space which size can be assigned by user when creating VM, the
+bar doesn't have backend memory at initialization stage, later driver in guest
+triggers QEMU to map host memory to the bar space. how much memory, when and
+where memory will be mapped to are determined by guest driver, after device
+memory has been attached to the virtual PCI bar, application in guest can
+access device memory by the virtual PCI bar. Memory allocation and negotiation
+are left to guest driver and memory backend implementation. dynamic_mdev is a
+mechanism which provides significant benefits to heterogeneous memory
+virtualization.
+
+Implementation:
+dynamic_mdev device has two bars, bar0 and bar2. bar0 is a 32-bit register bar
+which used to host control register for control and message communication, Bar2
+is a 64-bit mmio bar, which is used to attach host memory to, the bar size can
+be assigned via parameter when creating VM. Host memory is attached to this bar
+via mmap API.
+
+
+  VM1   VM2
+ -----
+|  application  |  | application  |
+|   |  |  |
+|---|  |--|
+| guest driver  |  | guest driver |
+|   |--||  |   | -|   |
+|   | pci mem bar  ||  |   | pci mem bar  |   |
+ ---|--|-   ---|--|---
+    --- --   --
+|| |   |   |  | |  |
+    --- --   --
+\  /
+ \/
+  \  /
+   \/
+|  |
+V  V
+ --- /dev/mdev.mmap 
+| --   --   --   --   --   --   |
+||  | |  | |  | |  | |  | |  |  <-free_mem_list |
+| --   --   --   --   --   --   |
+|   |
+|   HDM(host managed device memory )|
+ ---
+
+1. Create device:
+-device dyanmic-mdevice,size=0x2,align=0x4000,mem-path=/dev/mdev
+
+size: bar space size
+aglin: alignment of dynamical attached memory
+mem-path: host backend memory device
+
+
+2. Registers to control dynamical memory attach
+All register is placed in bar0
+
+INT_MASK = 0, /* RW */
+INT_STATUS   = 4, /* RW: write 1 clear */
+DOOR_BELL= 8, /*
+   * RW: trigger device to act
+   *  31150
+   *  
+   * |en||  cmd   |
+   *  
+   */
+
+/* RO: 4k, 2M, 1G aglign for memory size */
+MEM_ALIGN   =  12,
+
+/* RO: offset in memory bar shows bar space has had ram map */
+HW_OFFSET= 16,
+
+/* RW: size of dynamical attached memory */
+MEM_SIZE = 24,
+
+/* RW: offset in host m

[no subject]

2021-09-25 Thread David Dai


Add a virtual pci to QEMU, this pci device is used to dynamically attach memroy 
to VM,
so driver in guest can apply host memory in fly without virtualization 
management
software's help, such as libvirt/manager. The attached memory is isolated from 
System RAM,
 it can be used in heterogeneous memory management for virtualization.






[Qemu-devel] [PATCH 1/1] Migration: libvirt live migration over RDMA of ipv6 addr failed

2017-01-24 Thread David Dai
Using libvirt to do live migration over RDMA via ip v6 address failed. 
For example:
# virsh migrate  --live --migrateuri rdma://[deba::]:49152  \
  rhel73_host1_guest1 qemu+ssh://[deba::]/system --verbose
root@deba::'s password:
error: internal error: unable to execute QEMU command 'migrate': RDMA ERROR:  
  could not rdma_getaddrinfo address deba

As we can see, the ip v6 address used by rdma_getaddrinfo() has only "deba" 
part. It should be "deba::".

1) According to rfc 3986, a literal ip v6 address should be enclosed 
in '[' and ']'.
When using virsh command to do live migration via ip v6 addresss, user
will input the ip v6 address with brackets (i.e. rdma://[deba::]:49152).
libvirt will parse command line option by calling virURIParse(). 
Inside it calls virStringStripIPv6Brackets() to strip off the brackets.
The uri passed in to virURIParse()  is:
   "uri = rdma://[deba::]:49152"
Inside virURIParse() routine, it will strip off the bracket '[' and ']' if
it's ip v6 address. Then save the ip v6 address in this format "deba::" 
in the virURI->server field, and to be passed to qemu.

2) At the beginning of migration, in qemu's qemu_rdma_data_init(host_port) 
routine, it calls inet_parse(host_port) routine to parse the ip v6 address and 
port string obtained from libvirt.
The input string host_port passed to qemu_rdma_data_init() can be:
"hostname:port", or
"ipv4address:port", or
"[ipv6address]:port" (i.e "[deba::]:49152"), or
"ipv6address:port" (i.e "deba:::49152").
Existing qemu api inet_parse() can handle the above first 3 cases properly,  
but didn't handle the last case ("ipv6address:port") correctly.
In this live migration over rdma via ip v6 address case, the server ip v6 
address obtained from libvirt doesn't contain the brackets '[' and ']' 
(i.e. "deba:::49152"). It caused inet_parse() to parse only "deba" part, 
and stopped at the 1st colon ':'. As the result, the subsequent 
rdma_getaddrinfo() with ip address "deba" will fail.

If we don't strip off brackets '[' and ']' for an ip v6 address in libvirt's 
virURIParse(), it will cause libvirt ipv6 ssh authentication failure.

NOTE:
If using libvirt to do live migration over TCP via ip v6 address:
# virsh migrate  --live --migrateuri tcp://[deba::]:49152  \
  rhel73_host1_guest1 qemu+ssh://[deba::]/system --verbose
It works fine.
In migrateuri of tcp case, libvirt will call virNetSocketNewConnectTCP()
directly to connect to remote "deba:::49152" after it strips off
the bracket '[' and ']' for an ip v6 address. 
On qemu side, fd_start_outgoing_migration() will be called to do migration.
It doesn't call inet_parse(). So we don't see issue in tcp case.

Solution:
I choose to fix the code in qemu's inet_parse() routine to parse the
ip v6 addresss w/o brackets properly (i.e. "deba:::49152" format).

Signed-off-by: David Dai <z...@linux.vnet.ibm.com>
---
 util/qemu-sockets.c |   28 +++-
 1 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 7c120c4..e09191c 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -584,6 +584,8 @@ InetSocketAddress *inet_parse(const char *str, Error **errp)
 char port[33];
 int to;
 int pos;
+char *first_col_p = strchr(str, ':');
+char *last_col_p = strrchr(str, ':');
 
 addr = g_new0(InetSocketAddress, 1);
 
@@ -595,11 +597,27 @@ InetSocketAddress *inet_parse(const char *str, Error 
**errp)
 error_setg(errp, "error parsing port in address '%s'", str);
 goto fail;
 }
-} else if (str[0] == '[') {
-/* IPv6 addr */
-if (sscanf(str, "[%64[^]]]:%32[^,]%n", host, port, ) != 2) {
-error_setg(errp, "error parsing IPv6 address '%s'", str);
-goto fail;
+} else if (first_col_p != last_col_p) {
+if (str[0] != '[') {
+/* IPv6 addr w/o brackets */
+char *port_p;
+char *comma_p;
+
+pstrcpy(host, sizeof(host), str);
+port_p = strrchr(host, ':');
+*port_p++ = '\0';
+pstrcpy(port, sizeof(port), port_p);
+comma_p = strchr(port, ',');
+if (comma_p != NULL) {
+*comma_p = '\0';
+}
+pos = strlen(host) + strlen(port) + 1;
+} else {
+/* IPv6 addr with brackets */
+if (2 != sscanf(str, "[%64[^]]]:%32[^,]%n", host, port, )) {
+error_setg(errp, "error parsing IPv6 address '%s'", str);
+goto fail;
+}
 }
 addr->ipv6 = addr->has_ipv6 = true;
 } else {
-- 
1.7.1