Re: [PATCH 1/3] Provide simple noop dma ops

2015-11-02 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 02:20:35PM +0100, Christian Borntraeger wrote:
> +static void *dma_noop_alloc(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp,
> + struct dma_attrs *attrs)
> +{
> + void *ret;
> +
> + ret = (void *)__get_free_pages(gfp, get_order(size));
> + if (ret) {
> + memset(ret, 0, size);

There is no need to zero out the memory here. If the user wants
initialized memory it can call dma_zalloc_coherent. Having the memset
here means to clear the memory twice in the dma_zalloc_coherent path.

Otherwise it looks good.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()

2015-11-02 Thread Vladimir Sementsov-Ogievskiy

On 02.11.2015 12:13, Xiao Guangrong wrote:

It is used to get the size of the specified file, also qemu_fd_getlength()
is introduced to unify the code with raw_getlength() in block/raw-posix.c

Signed-off-by: Xiao Guangrong 
---
  block/raw-posix.c|  7 +--
  include/qemu/osdep.h |  2 ++
  util/osdep.c | 31 +++
  3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 918c756..734e6dd 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1592,18 +1592,13 @@ static int64_t raw_getlength(BlockDriverState *bs)
  {
  BDRVRawState *s = bs->opaque;
  int ret;
-int64_t size;
  
  ret = fd_open(bs);

  if (ret < 0) {
  return ret;
  }
  
-size = lseek(s->fd, 0, SEEK_END);

-if (size < 0) {
-return -errno;
-}
-return size;
+return qemu_fd_getlength(s->fd);
  }
  #endif
  
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h

index dbc17dc..ca4c3fa 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -303,4 +303,6 @@ int qemu_read_password(char *buf, int buf_size);
  pid_t qemu_fork(Error **errp);
  
  size_t qemu_file_get_page_size(const char *mem_path, Error **errp);

+int64_t qemu_fd_getlength(int fd);
+size_t qemu_file_getlength(const char *file, Error **errp);
  #endif
diff --git a/util/osdep.c b/util/osdep.c
index 0092bb6..5a61e19 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -428,3 +428,34 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
  return readv_writev(fd, iov, iov_cnt, true);
  }
  #endif
+
+int64_t qemu_fd_getlength(int fd)
+{
+int64_t size;
+
+size = lseek(fd, 0, SEEK_END);
+if (size < 0) {
+return -errno;
+}
+return size;
+}
+
+size_t qemu_file_getlength(const char *file, Error **errp)
+{
+int64_t size;
+int fd = qemu_open(file, O_RDONLY);
+
+if (fd < 0) {
+error_setg_file_open(errp, errno, file);
+return 0;
+}
+
+size = qemu_fd_getlength(fd);
+if (size < 0) {
+error_setg_errno(errp, -size, "can't get size of file %s", file);
+size = 0;
+}
+
+qemu_close(fd);
+return size;
+}


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 0/3] dma ops and virtio

2015-11-02 Thread Sebastian Ott
Hi,

On Fri, 30 Oct 2015, Christian Borntraeger wrote:

> here is the 2nd version of providing an DMA API for s390.
> 
> There are some attempts to unify the dma ops (Christoph) as well
> as some attempts to make virtio use the dma API (Andy).
> 
> At kernel summit we concluded that we want to use the same code on all
> platforms, whereever possible, so having a dummy dma_op might be the
> easiest solution to keep virtio-ccw as similar as possible to
> virtio-pci.Together with a fixed up patch set from Andy Lutomirski
> this seems to work.  
> 
> We will also need a fixup for powerc and QEMU changes to make virtio
> work with iommu on power and x86.
> 
> TODO:
> - future add-on patches to also fold in x86 no iommu
>   - dma_mask
>   - checking?
> - make compilation of dma-noop dependent on something
> 
> v1->v2:
> - initial testing
> - always use dma_noop_ops if device has no private dma_ops
> - get rid of setup in virtio_ccw,kvm_virtio
> - set CONFIG_HAS_DMA(ATTRS) for virtio (fixes compile for !PCI)
> - rename s390_dma_ops to s390_pci_dma_ops
> 
> Christian Borntraeger (3):
>   Provide simple noop dma ops
>   alpha: use common noop dma ops
>   s390/dma: Allow per device dma ops
> 
>  arch/alpha/kernel/pci-noop.c| 46 ++
>  arch/s390/Kconfig   |  3 +-
>  arch/s390/include/asm/device.h  |  6 ++-
>  arch/s390/include/asm/dma-mapping.h |  6 ++-
>  arch/s390/pci/pci.c |  1 +
>  arch/s390/pci/pci_dma.c |  4 +-
>  include/linux/dma-mapping.h |  2 +
>  lib/Makefile|  2 +-
>  lib/dma-noop.c  | 77 
> +
>  9 files changed, 98 insertions(+), 49 deletions(-)
>  create mode 100644 lib/dma-noop.c
> 
> -- 

I agree with these changes in principle. As long as we don't do MMIO
(writel and friends) on a dummy mapping we're fine.

Regards,
Sebastian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path

2015-11-02 Thread Vladimir Sementsov-Ogievskiy

On 02.11.2015 18:22, Xiao Guangrong wrote:



On 11/02/2015 10:51 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 12:13, Xiao Guangrong wrote:
Currently file_ram_alloc() is designed for hugetlbfs, however, the 
memory
of nvdimm can come from either raw pmem device eg, /dev/pmem, or the 
file

locates at DAX enabled filesystem

So this patch let it work on any kind of path


No, this patch doesn't change any logic, but only fix variable name 
and some error messages.


Yes, it is.

'let it work' in my thought exactly was "fix variable name and some 
error messages"... okay,

if it confused you, how about change it to:

"This patch fixes variable name and some error messages to let it be 
aware of normal

path"


My english is not very good, I don't know figures of speech. For me 
"patch let it work" means that without this patch it will not work))
Your new variant is ok for me, or better (imo) "This patch fixes 
variable name and some error messages to be suitable for any kind of path"








Signed-off-by: Xiao Guangrong 
---
  exec.c | 24 
  1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index 9de38be..9075f4d 100644
--- a/exec.c
+++ b/exec.c
@@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
  char *c;
  void *area;
  int fd;
-uint64_t hpagesize;
+uint64_t pagesize;
  Error *local_err = NULL;
-hpagesize = qemu_file_get_page_size(path, _err);
+pagesize = qemu_file_get_page_size(path, _err);
  if (local_err) {
  error_propagate(errp, local_err);
  goto error;
  }
-if (hpagesize == getpagesize()) {
-fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+if (pagesize == getpagesize()) {
+fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");


Why do you remove path from error message? It is good additional 
information.. What if we have

several memory file backends?


Good catch, will change it to:
fprintf(stderr, "Memory is not allocated from HugeTlbfs on path 
%s.\n", path);


BTW, if no other big change in the further, i will post the new 
version just for of this patch,





  }
-block->mr->align = hpagesize;
+block->mr->align = pagesize;
-if (memory < hpagesize) {
+if (memory < pagesize) {
  error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be 
equal to "

-   "or larger than huge page size 0x%" PRIx64,
-   memory, hpagesize);
+   "or larger than page size 0x%" PRIx64,
+   memory, pagesize);
  goto error;
  }
@@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
  fd = mkstemp(filename);
  if (fd < 0) {
  error_setg_errno(errp, errno,
- "unable to create backing store for 
hugepages");
+ "unable to create backing store for path 
%s", path);

  g_free(filename);
  goto error;
  }
  unlink(filename);
  g_free(filename);
-memory = ROUND_UP(memory, hpagesize);
+memory = ROUND_UP(memory, pagesize);
  /*
   * ftruncate is not supported by hugetlbfs in older
@@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
  perror("ftruncate");
  }
-area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & 
RAM_SHARED);
+area = qemu_ram_mmap(fd, memory, pagesize, block->flags & 
RAM_SHARED);

  if (area == MAP_FAILED) {
  error_setg_errno(errp, errno,
- "unable to map backing store for hugepages");
+ "unable to map backing store for path %s", 
path);

  close(fd);
  goto error;
  }





With these two fixes (any commit message variant):

Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region

2015-11-02 Thread Vladimir Sementsov-Ogievskiy

On 02.11.2015 16:08, Xiao Guangrong wrote:



On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 12:13, Xiao Guangrong wrote:

Curretly, the memory region of backed memory is directly mapped to
guest's address space, however, it is not true for nvdimm device

This patch let dimm device realize this fact and use
DIMMDeviceClass->get_memory_region method to get the mapped memory
region

Current code did not check the return value of get_memory_region as it
assumed the backend memory of pc-dimm is always properly initialized,
we make get_memory_region internally catch the case if something is
wrong


but here you call not pc-dimm's get_memory_region, but common 
ddc->get_memory_region, which may be nvdimm or possibly other future 
dimm, so, why not check it here? And than pc_dimm_get_memory_region may 
be left untouched (error_abort is ok, because errp is unused).




Signed-off-by: Xiao Guangrong 
---
  hw/mem/dimm.c|  3 ++-
  hw/mem/pc-dimm.c | 12 +++-
  2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 4a63409..498d380 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor 
*v, void *opaque,

  int64_t value;
  MemoryRegion *mr;
  DIMMDevice *dimm = DIMM(obj);
+DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
-mr = host_memory_backend_get_memory(dimm->hostmem, errp);
+mr = ddc->get_memory_region(dimm);
  value = memory_region_size(mr);
  visit_type_int(v, , name, errp);
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 38323e9..e6b6a9f 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -22,7 +22,17 @@
  static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
  {
-return host_memory_backend_get_memory(dimm->hostmem, 
_abort);

+Error *local_err = NULL;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(dimm->hostmem, _err);
+
+/*
+ * plug a pc-dimm device whose backend memory was not properly
+ * initialized?
+ */
+assert(!local_err && mr);
+return mr;
  }


this should squashed into previous patch, I think


You mean merger this patch with 19/35 (dimm: abstract dimm device from 
pc-dimm)?
The 19/35 mostly ‘moves’ the things, this one changes the core logic, 
it is not

a big deal. :D


stupid me, you are right)






  static void pc_dimm_class_init(ObjectClass *oc, void *data)


I've discovered suddenly, that

MemoryRegion *
host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
{
 return memory_region_size(>mr) ? >mr : NULL;
}

- it doesn't use errp at all. In my opinion, this should be fixed 
globally, by deleting useless
parameter in separate patch. Or just squash your function into 
previous patch.




Yup, this is a globally interface so i prefer to make a separate patch 
to do the

cleanup after this patchset.




--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path

2015-11-02 Thread Vladimir Sementsov-Ogievskiy

On 02.11.2015 12:13, Xiao Guangrong wrote:

Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
locates at DAX enabled filesystem

So this patch let it work on any kind of path


No, this patch doesn't change any logic, but only fix variable name and 
some error messages.




Signed-off-by: Xiao Guangrong 
---
  exec.c | 24 
  1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index 9de38be..9075f4d 100644
--- a/exec.c
+++ b/exec.c
@@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
  char *c;
  void *area;
  int fd;
-uint64_t hpagesize;
+uint64_t pagesize;
  Error *local_err = NULL;
  
-hpagesize = qemu_file_get_page_size(path, _err);

+pagesize = qemu_file_get_page_size(path, _err);
  if (local_err) {
  error_propagate(errp, local_err);
  goto error;
  }
  
-if (hpagesize == getpagesize()) {

-fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+if (pagesize == getpagesize()) {
+fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");


Why do you remove path from error message? It is good additional 
information.. What if we have several memory file backends?



  }
  
-block->mr->align = hpagesize;

+block->mr->align = pagesize;
  
-if (memory < hpagesize) {

+if (memory < pagesize) {
  error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
-   "or larger than huge page size 0x%" PRIx64,
-   memory, hpagesize);
+   "or larger than page size 0x%" PRIx64,
+   memory, pagesize);
  goto error;
  }
  
@@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,

  fd = mkstemp(filename);
  if (fd < 0) {
  error_setg_errno(errp, errno,
- "unable to create backing store for hugepages");
+ "unable to create backing store for path %s", path);
  g_free(filename);
  goto error;
  }
  unlink(filename);
  g_free(filename);
  
-memory = ROUND_UP(memory, hpagesize);

+memory = ROUND_UP(memory, pagesize);
  
  /*

   * ftruncate is not supported by hugetlbfs in older
@@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
  perror("ftruncate");
  }
  
-area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);

+area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
  if (area == MAP_FAILED) {
  error_setg_errno(errp, errno,
- "unable to map backing store for hugepages");
+ "unable to map backing store for path %s", path);
  close(fd);
  goto error;
  }



--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 07/35] util: introduce qemu_file_get_page_size()

2015-11-02 Thread Vladimir Sementsov-Ogievskiy

On 02.11.2015 12:13, Xiao Guangrong wrote:

There are three places use the some logic to get the page size on
the file path or file fd

Windows did not support file hugepage, so it will return normal page
for this case. And this interface has not been used on windows so far
  
This patch introduces qemu_file_get_page_size() to unify the code


Signed-off-by: Xiao Guangrong 
---
  exec.c   | 33 ++---
  include/qemu/osdep.h |  1 +
  target-ppc/kvm.c | 23 +--
  util/oslib-posix.c   | 37 +
  util/oslib-win32.c   |  5 +
  5 files changed, 50 insertions(+), 49 deletions(-)

diff --git a/exec.c b/exec.c
index 8af2570..9de38be 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,32 +1174,6 @@ void qemu_mutex_unlock_ramlist(void)
  }
  
  #ifdef __linux__

-
-#include 
-
-#define HUGETLBFS_MAGIC   0x958458f6
-
-static long gethugepagesize(const char *path, Error **errp)
-{
-struct statfs fs;
-int ret;
-
-do {
-ret = statfs(path, );
-} while (ret != 0 && errno == EINTR);
-
-if (ret != 0) {
-error_setg_errno(errp, errno, "failed to get page size of file %s",
- path);
-return 0;
-}
-
-if (fs.f_type != HUGETLBFS_MAGIC)
-fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
-
-return fs.f_bsize;
-}
-
  static void *file_ram_alloc(RAMBlock *block,
  ram_addr_t memory,
  const char *path,
@@ -1213,11 +1187,16 @@ static void *file_ram_alloc(RAMBlock *block,
  uint64_t hpagesize;
  Error *local_err = NULL;
  
-hpagesize = gethugepagesize(path, _err);

+hpagesize = qemu_file_get_page_size(path, _err);
  if (local_err) {
  error_propagate(errp, local_err);
  goto error;
  }
+
+if (hpagesize == getpagesize()) {
+fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+}
+
  block->mr->align = hpagesize;
  
  if (memory < hpagesize) {

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b568424..dbc17dc 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -302,4 +302,5 @@ int qemu_read_password(char *buf, int buf_size);
   */
  pid_t qemu_fork(Error **errp);
  
+size_t qemu_file_get_page_size(const char *mem_path, Error **errp);

  #endif
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index ac70f08..d8760ea 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -308,28 +308,15 @@ static void kvm_get_smmu_info(PowerPCCPU *cpu, struct 
kvm_ppc_smmu_info *info)
  
  static long gethugepagesize(const char *mem_path)

  {
-struct statfs fs;
-int ret;
-
-do {
-ret = statfs(mem_path, );
-} while (ret != 0 && errno == EINTR);
+Error *local_err = NULL;
+long size = qemu_file_get_page_size(mem_path, local_err);
  
-if (ret != 0) {

-fprintf(stderr, "Couldn't statfs() memory path: %s\n",
-strerror(errno));
+if (local_err) {
+error_report_err(local_err);
  exit(1);
  }
  
-#define HUGETLBFS_MAGIC   0x958458f6

-
-if (fs.f_type != HUGETLBFS_MAGIC) {
-/* Explicit mempath, but it's ordinary pages */
-return getpagesize();
-}
-
-/* It's hugepage, return the huge page size */
-return fs.f_bsize;
+return size;
  }
  
  static int find_max_supported_pagesize(Object *obj, void *opaque)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 914cef5..51437ff 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
  siglongjmp(sigjump, 1);
  }
  
-static size_t fd_getpagesize(int fd)

+static size_t fd_getpagesize(int fd, Error **errp)
  {
  #ifdef CONFIG_LINUX
  struct statfs fs;
@@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
  ret = fstatfs(fd, );
  } while (ret != 0 && errno == EINTR);
  
-if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {

+if (ret) {
+error_setg_errno(errp, errno, "fstatfs is failed");
+return 0;
+}
+
+if (fs.f_type == HUGETLBFS_MAGIC) {
  return fs.f_bsize;
  }
  }
@@ -360,6 +365,22 @@ static size_t fd_getpagesize(int fd)
  return getpagesize();
  }
  
+size_t qemu_file_get_page_size(const char *path, Error **errp)

+{
+size_t size = 0;
+int fd = qemu_open(path, O_RDONLY);
+
+if (fd < 0) {
+error_setg_file_open(errp, errno, path);
+goto exit;
+}
+
+size = fd_getpagesize(fd, errp);
+qemu_close(fd);
+exit:
+return size;
+}
+
  void os_mem_prealloc(int fd, char *area, size_t memory)
  {
  int ret;
@@ -387,8 +408,16 @@ void os_mem_prealloc(int fd, char *area, size_t memory)
  exit(1);
  } else {
  int i;
-size_t hpagesize = fd_getpagesize(fd);
-size_t numpages = 

Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file

2015-11-02 Thread Vladimir Sementsov-Ogievskiy

On 02.11.2015 12:13, Xiao Guangrong wrote:

Currently, file_ram_alloc() only works on directory - it creates a file
under @path and do mmap on it

This patch tries to allow it to work on file directly, if @path is a


It isn't try to allow, it allows, as I understand)


directory it works as before, otherwise it treats @path as the target
file then directly allocate memory from it

Signed-off-by: Xiao Guangrong 
---
  exec.c | 80 ++
  1 file changed, 51 insertions(+), 29 deletions(-)

diff --git a/exec.c b/exec.c
index 9075f4d..db0fdaf 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
  }
  
  #ifdef __linux__

+static bool path_is_dir(const char *path)
+{
+struct stat fs;
+
+return stat(path, ) == 0 && S_ISDIR(fs.st_mode);
+}
+
+static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
+{
+char *filename;
+char *sanitized_name;
+char *c;
+int fd;
+
+if (!path_is_dir(path)) {
+int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
+
+flags |= O_EXCL;
+return open(path, flags);
+}
+
+/* Make name safe to use with mkstemp by replacing '/' with '_'. */
+sanitized_name = g_strdup(memory_region_name(block->mr));
+for (c = sanitized_name; *c != '\0'; c++) {
+if (*c == '/') {
+*c = '_';
+}
+}
+filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
+   sanitized_name);
+g_free(sanitized_name);


one empty line will be very nice here, and it was in master branch


+fd = mkstemp(filename);
+if (fd >= 0) {
+unlink(filename);
+/*
+ * ftruncate is not supported by hugetlbfs in older
+ * hosts, so don't bother bailing out on errors.
+ * If anything goes wrong with it under other filesystems,
+ * mmap will fail.
+ */
+if (ftruncate(fd, size)) {
+perror("ftruncate");
+}
+}
+g_free(filename);
+
+return fd;
+}
+
  static void *file_ram_alloc(RAMBlock *block,
  ram_addr_t memory,
  const char *path,
  Error **errp)
  {
-char *filename;
-char *sanitized_name;
-char *c;
  void *area;
  int fd;
  uint64_t pagesize;
@@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
  goto error;
  }
  
-/* Make name safe to use with mkstemp by replacing '/' with '_'. */

-sanitized_name = g_strdup(memory_region_name(block->mr));
-for (c = sanitized_name; *c != '\0'; c++) {
-if (*c == '/')
-*c = '_';
-}
-
-filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
-   sanitized_name);
-g_free(sanitized_name);
+memory = ROUND_UP(memory, pagesize);
  
-fd = mkstemp(filename);

+fd = open_ram_file_path(block, path, memory);
  if (fd < 0) {
  error_setg_errno(errp, errno,
   "unable to create backing store for path %s", path);
-g_free(filename);
  goto error;
  }
-unlink(filename);
-g_free(filename);
-
-memory = ROUND_UP(memory, pagesize);
-
-/*
- * ftruncate is not supported by hugetlbfs in older
- * hosts, so don't bother bailing out on errors.
- * If anything goes wrong with it under other filesystems,
- * mmap will fail.
- */
-if (ftruncate(fd, memory)) {
-perror("ftruncate");
-}
  
  area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);

  if (area == MAP_FAILED) {



Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] kvmtool: Cleanup kernel loading

2015-11-02 Thread Dimitri John Ledkov
On 2 November 2015 at 14:58, Will Deacon  wrote:
> On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote:
>> Hi,
>
> Hello Andre,
>
>> this series cleans up kvmtool's kernel loading functionality a bit.
>> It has been broken out of a previous series I sent [1] and contains
>> just the cleanup and bug fix parts, which should be less controversial
>> and thus easier to merge ;-)
>> I will resend the pipe loading part later on as a separate series.
>>
>> The first patch properly abstracts kernel loading to move
>> responsibility into each architecture's code. It removes quite some
>> ugly code from the generic kvm.c file.
>> The later patches address the naive usage of read(2) to, well, read
>> data from files. Doing this without coping with the subtleties of
>> the UNIX read semantics (returning with less or none data read is not
>> an error) can provoke hard to debug failures.
>> So these patches make use of the existing and one new wrapper function
>> to make sure we read everything we actually wanted to.
>> The last patch moves the ARM kernel loading code into the proper
>> location to be in line with the other architectures.
>>
>> Please have a look and give some comments!
>
> Looks good to me, but I'd like to see some comments from some mips/ppc/x86
> people on the changes you're making over there.

Looks mostly good to me, as one of the kvmtool down streams. Over at
https://github.com/clearlinux/kvmtool we have some patches to tweak
the x86 boot flow, which will need rebasing/retweaking) specifically
this commit here -
https://github.com/clearlinux/kvmtool/commit/a8dee709f85735d16739d0eda0cc00d3c1b17477

-- 
Regards,

Dimitri.
53 sleeps till Christmas, or less

https://clearlinux.org
Open Source Technology Center
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] kvmtool: Cleanup kernel loading

2015-11-02 Thread Dimitri John Ledkov
On 2 November 2015 at 14:58, Will Deacon  wrote:
> On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote:
>> Hi,
>
> Hello Andre,
>
>> this series cleans up kvmtool's kernel loading functionality a bit.
>> It has been broken out of a previous series I sent [1] and contains
>> just the cleanup and bug fix parts, which should be less controversial
>> and thus easier to merge ;-)
>> I will resend the pipe loading part later on as a separate series.
>>
>> The first patch properly abstracts kernel loading to move
>> responsibility into each architecture's code. It removes quite some
>> ugly code from the generic kvm.c file.
>> The later patches address the naive usage of read(2) to, well, read
>> data from files. Doing this without coping with the subtleties of
>> the UNIX read semantics (returning with less or none data read is not
>> an error) can provoke hard to debug failures.
>> So these patches make use of the existing and one new wrapper function
>> to make sure we read everything we actually wanted to.
>> The last patch moves the ARM kernel loading code into the proper
>> location to be in line with the other architectures.
>>
>> Please have a look and give some comments!
>
> Looks good to me, but I'd like to see some comments from some mips/ppc/x86
> people on the changes you're making over there.

Looks mostly good to me, as one of the kvmtool down streams. Over at
https://github.com/clearlinux/kvmtool we have some patches to tweak
the x86 boot flow, which will need rebasing/retweaking) specifically
this commit here -
https://github.com/clearlinux/kvmtool/commit/a8dee709f85735d16739d0eda0cc00d3c1b17477

-- 
Regards,

Dimitri.
53 sleeps till Christmas, or less

https://clearlinux.org
Open Source Technology Center
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] alpha: use common noop dma ops

2015-11-02 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 02:20:36PM +0100, Christian Borntraeger wrote:
> Some of the alpha pci noop dma ops are identical to the common ones.
> Use them.
> 
> Signed-off-by: Christian Borntraeger 
> ---
>  arch/alpha/kernel/pci-noop.c | 46 
> 
>  1 file changed, 4 insertions(+), 42 deletions(-)

Reviewed-by: Joerg Roedel 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path

2015-11-02 Thread Xiao Guangrong



On 11/02/2015 10:51 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 12:13, Xiao Guangrong wrote:

Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
locates at DAX enabled filesystem

So this patch let it work on any kind of path


No, this patch doesn't change any logic, but only fix variable name and some 
error messages.


Yes, it is.

'let it work' in my thought exactly was "fix variable name and some error 
messages"... okay,
if it confused you, how about change it to:

"This patch fixes variable name and some error messages to let it be aware of 
normal
path"





Signed-off-by: Xiao Guangrong 
---
  exec.c | 24 
  1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index 9de38be..9075f4d 100644
--- a/exec.c
+++ b/exec.c
@@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
  char *c;
  void *area;
  int fd;
-uint64_t hpagesize;
+uint64_t pagesize;
  Error *local_err = NULL;
-hpagesize = qemu_file_get_page_size(path, _err);
+pagesize = qemu_file_get_page_size(path, _err);
  if (local_err) {
  error_propagate(errp, local_err);
  goto error;
  }
-if (hpagesize == getpagesize()) {
-fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+if (pagesize == getpagesize()) {
+fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");


Why do you remove path from error message? It is good additional information.. 
What if we have
several memory file backends?


Good catch, will change it to:
fprintf(stderr, "Memory is not allocated from HugeTlbfs on path %s.\n", path);

BTW, if no other big change in the further, i will post the new version just 
for of this patch,




  }
-block->mr->align = hpagesize;
+block->mr->align = pagesize;
-if (memory < hpagesize) {
+if (memory < pagesize) {
  error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
-   "or larger than huge page size 0x%" PRIx64,
-   memory, hpagesize);
+   "or larger than page size 0x%" PRIx64,
+   memory, pagesize);
  goto error;
  }
@@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
  fd = mkstemp(filename);
  if (fd < 0) {
  error_setg_errno(errp, errno,
- "unable to create backing store for hugepages");
+ "unable to create backing store for path %s", path);
  g_free(filename);
  goto error;
  }
  unlink(filename);
  g_free(filename);
-memory = ROUND_UP(memory, hpagesize);
+memory = ROUND_UP(memory, pagesize);
  /*
   * ftruncate is not supported by hugetlbfs in older
@@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
  perror("ftruncate");
  }
-area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
+area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
  if (area == MAP_FAILED) {
  error_setg_errno(errp, errno,
- "unable to map backing store for hugepages");
+ "unable to map backing store for path %s", path);
  close(fd);
  goto error;
  }




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-11-02 Thread Haozhong Zhang
On Mon, Nov 02, 2015 at 09:40:18AM +, James Hogan wrote:
> On Mon, Nov 02, 2015 at 05:26:42PM +0800, Haozhong Zhang wrote:
> > The value of the migrated vcpu's TSC rate is determined as below.
> >  1. If a TSC rate is specified by the cpu option 'tsc-freq', then this
> > user-specified value will be used.
> >  2. If neither a user-specified TSC rate nor a migrated TSC rate is
> > present, we will use the TSC rate from KVM (returned by
> > KVM_GET_TSC_KHZ).
> >  3. Otherwise, we will use the migrated TSC rate.
> > 
> > Signed-off-by: Haozhong Zhang 
> > ---
> >  include/sysemu/kvm.h |  2 ++
> >  kvm-all.c|  1 +
> >  target-arm/kvm.c |  5 +
> >  target-i386/kvm.c| 33 +
> >  target-mips/kvm.c|  5 +
> >  target-ppc/kvm.c |  5 +
> >  target-s390x/kvm.c   |  5 +
> >  7 files changed, 56 insertions(+)
> > 
> > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> > index 461ef65..0ec8b98 100644
> > --- a/include/sysemu/kvm.h
> > +++ b/include/sysemu/kvm.h
> > @@ -328,6 +328,8 @@ int kvm_arch_fixup_msi_route(struct 
> > kvm_irq_routing_entry *route,
> >  
> >  int kvm_arch_msi_data_to_gsi(uint32_t data);
> >  
> > +int kvm_arch_setup_tsc_khz(CPUState *cpu);
> > +
> >  int kvm_set_irq(KVMState *s, int irq, int level);
> >  int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
> >  
> > diff --git a/kvm-all.c b/kvm-all.c
> > index c442838..1ecaf04 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -1757,6 +1757,7 @@ static void do_kvm_cpu_synchronize_post_init(void 
> > *arg)
> >  {
> >  CPUState *cpu = arg;
> >  
> > +kvm_arch_setup_tsc_khz(cpu);
> 
> Sorry if this is a stupid question, but why aren't you doing this from
> the i386 kvm_arch_put_registers when level == KVM_PUT_FULL_STATE, rather
> than introducing x86 specifics to the generic KVM api?
> 
> Cheers
> James
>

Yes, I could call kvm_arch_setup_tsc_khz() in kvm_arch_put_registers()
of target-i386 when level == KVM_PUT_FULL_STATE, so that I will not need
to make kvm_arch_setup_tsc_khz() a generic KVM API (which looks weird
for targets other than i386).

Thanks, James!

Haozhong

> >  kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
> >  cpu->kvm_vcpu_dirty = false;
> >  }
> > diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> > index 79ef4c6..a724f6d 100644
> > --- a/target-arm/kvm.c
> > +++ b/target-arm/kvm.c
> > @@ -614,3 +614,8 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> >  {
> >  return (data - 32) & 0x;
> >  }
> > +
> > +int kvm_arch_setup_tsc_khz(CPUState *cs)
> > +{
> > +return 0;
> > +}
> > diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > index 64046cb..aae5e58 100644
> > --- a/target-i386/kvm.c
> > +++ b/target-i386/kvm.c
> > @@ -3034,3 +3034,36 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> >  {
> >  abort();
> >  }
> > +
> > +int kvm_arch_setup_tsc_khz(CPUState *cs)
> > +{
> > +X86CPU *cpu = X86_CPU(cs);
> > +CPUX86State *env = >env;
> > +int r;
> > +
> > +/*
> > + * Prepare vcpu's TSC rate to be migrated.
> > + *
> > + * - If the user specifies the TSC rate by cpu option 'tsc-freq',
> > + *   we will use the user-specified value.
> > + *
> > + * - If there is neither user-specified TSC rate nor migrated TSC
> > + *   rate, we will ask KVM for the TSC rate by calling
> > + *   KVM_GET_TSC_KHZ.
> > + *
> > + * - Otherwise, if there is a migrated TSC rate, we will use the
> > + *   migrated value.
> > + */
> > +if (env->tsc_khz) {
> > +env->tsc_khz_saved = env->tsc_khz;
> > +} else if (!env->tsc_khz_saved) {
> > +r = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
> > +if (r < 0) {
> > +fprintf(stderr, "KVM_GET_TSC_KHZ failed\n");
> > +return r;
> > +}
> > +env->tsc_khz_saved = r;
> > +}
> > +
> > +return 0;
> > +}
> > diff --git a/target-mips/kvm.c b/target-mips/kvm.c
> > index 12d7db3..fb26d7e 100644
> > --- a/target-mips/kvm.c
> > +++ b/target-mips/kvm.c
> > @@ -687,3 +687,8 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> >  {
> >  abort();
> >  }
> > +
> > +int kvm_arch_setup_tsc_khz(CPUState *cs)
> > +{
> > +return 0;
> > +}
> > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> > index ac70f08..c429f0c 100644
> > --- a/target-ppc/kvm.c
> > +++ b/target-ppc/kvm.c
> > @@ -2510,3 +2510,8 @@ int kvmppc_enable_hwrng(void)
> >  
> >  return kvmppc_enable_hcall(kvm_state, H_RANDOM);
> >  }
> > +
> > +int kvm_arch_setup_tsc_khz(CPUState *cs)
> > +{
> > +return 0;
> > +}
> > diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
> > index c3be180..db5d436 100644
> > --- a/target-s390x/kvm.c
> > +++ b/target-s390x/kvm.c
> > @@ -2248,3 +2248,8 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> >  {
> >  abort();
> >  }
> > +
> > +int kvm_arch_setup_tsc_khz(CPUState *cs)
> > +{
> > +return 0;
> > +}
> > -- 
> > 2.4.8
> > 


--
To 

Re: [PATCH 1/3] KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic

2015-11-02 Thread Radim Krcmar
2015-11-02 13:21+0100, Paolo Bonzini:
> We do not want to do too much work in atomic context, in particular
> not walking all the VCPUs of the virtual machine.  So we want
> to distinguish the architecture-specific injection function for irqfd
> from kvm_set_msi.  Since it's still empty, reuse the newly added
> kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic.

kvm/queue uses kvm_arch_set_irq since b7184313f4b9 ("kvm/x86: Hyper-V
synthetic interrupt controller").

Is synic going to be dropped before this patch is merged?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file

2015-11-02 Thread Xiao Guangrong



On 11/02/2015 11:12 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 12:13, Xiao Guangrong wrote:

Currently, file_ram_alloc() only works on directory - it creates a file
under @path and do mmap on it

This patch tries to allow it to work on file directly, if @path is a


It isn't try to allow, it allows, as I understand)...


Err... Sorry for my English, but what is the different between:
”This patch tries to allow it to work on file directly“ and
"This patch allows it to work on file directly"

:(




directory it works as before, otherwise it treats @path as the target
file then directly allocate memory from it

Signed-off-by: Xiao Guangrong 
---
  exec.c | 80 ++
  1 file changed, 51 insertions(+), 29 deletions(-)

diff --git a/exec.c b/exec.c
index 9075f4d..db0fdaf 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
  }
  #ifdef __linux__
+static bool path_is_dir(const char *path)
+{
+struct stat fs;
+
+return stat(path, ) == 0 && S_ISDIR(fs.st_mode);
+}
+
+static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
+{
+char *filename;
+char *sanitized_name;
+char *c;
+int fd;
+
+if (!path_is_dir(path)) {
+int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
+
+flags |= O_EXCL;
+return open(path, flags);
+}
+
+/* Make name safe to use with mkstemp by replacing '/' with '_'. */
+sanitized_name = g_strdup(memory_region_name(block->mr));
+for (c = sanitized_name; *c != '\0'; c++) {
+if (*c == '/') {
+*c = '_';
+}
+}
+filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
+   sanitized_name);
+g_free(sanitized_name);


one empty line will be very nice here, and it was in master branch


+fd = mkstemp(filename);
+if (fd >= 0) {
+unlink(filename);
+/*
+ * ftruncate is not supported by hugetlbfs in older
+ * hosts, so don't bother bailing out on errors.
+ * If anything goes wrong with it under other filesystems,
+ * mmap will fail.
+ */
+if (ftruncate(fd, size)) {
+perror("ftruncate");
+}
+}
+g_free(filename);
+
+return fd;
+}
+
  static void *file_ram_alloc(RAMBlock *block,
  ram_addr_t memory,
  const char *path,
  Error **errp)
  {
-char *filename;
-char *sanitized_name;
-char *c;
  void *area;
  int fd;
  uint64_t pagesize;
@@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
  goto error;
  }
-/* Make name safe to use with mkstemp by replacing '/' with '_'. */
-sanitized_name = g_strdup(memory_region_name(block->mr));
-for (c = sanitized_name; *c != '\0'; c++) {
-if (*c == '/')
-*c = '_';
-}
-
-filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
-   sanitized_name);
-g_free(sanitized_name);
+memory = ROUND_UP(memory, pagesize);
-fd = mkstemp(filename);
+fd = open_ram_file_path(block, path, memory);
  if (fd < 0) {
  error_setg_errno(errp, errno,
   "unable to create backing store for path %s", path);
-g_free(filename);
  goto error;
  }
-unlink(filename);
-g_free(filename);
-
-memory = ROUND_UP(memory, pagesize);
-
-/*
- * ftruncate is not supported by hugetlbfs in older
- * hosts, so don't bother bailing out on errors.
- * If anything goes wrong with it under other filesystems,
- * mmap will fail.
- */
-if (ftruncate(fd, memory)) {
-perror("ftruncate");
-}
  area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
  if (area == MAP_FAILED) {



Reviewed-by: Vladimir Sementsov-Ogievskiy 


Thanks for your review.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Joerg Roedel
On Thu, Oct 29, 2015 at 09:32:32AM +0200, Shamir Rabinovitch wrote:
> For the IB case, setting and tearing DMA mappings for the drivers data buffers
> is expensive. But we could for example consider to map all the HW control 
> objects
> that validate the HW access to the drivers data buffers as IOMMU protected 
> and so
> have IOMMU protection on those critical objects while having fast 
> set-up/tear-down
> of driver data buffers. The HW control objects have stable mapping for the 
> lifetime
> of the system. So the case of using both types of DMA mappings is still valid.

How do you envision this per-mapping by-pass to work? For the DMA-API
mappings you have only one device address space. For by-pass to work,
you need to map all physical memory of the machine into this space,
which leaves the question where you want to put the window for
remapping.

You surely have to put it after the physical mappings, but any
protection will be gone, as the device can access all physical memory.

So instead of working around the shortcomings of DMA-API
implementations, can you present us some numbers and analysis of how bad
the performance impact with an IOMMU is and what causes it?

I know that we have lock-contention issues in our IOMMU drivers, which
can be fixed. Maybe the performance problems you are seeing can be fixed
too, when you give us more details about them.


Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] kvmtool: Cleanup kernel loading

2015-11-02 Thread Will Deacon
On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote:
> Hi,

Hello Andre,

> this series cleans up kvmtool's kernel loading functionality a bit.
> It has been broken out of a previous series I sent [1] and contains
> just the cleanup and bug fix parts, which should be less controversial
> and thus easier to merge ;-)
> I will resend the pipe loading part later on as a separate series.
> 
> The first patch properly abstracts kernel loading to move
> responsibility into each architecture's code. It removes quite some
> ugly code from the generic kvm.c file.
> The later patches address the naive usage of read(2) to, well, read
> data from files. Doing this without coping with the subtleties of
> the UNIX read semantics (returning with less or none data read is not
> an error) can provoke hard to debug failures.
> So these patches make use of the existing and one new wrapper function
> to make sure we read everything we actually wanted to.
> The last patch moves the ARM kernel loading code into the proper
> location to be in line with the other architectures.
> 
> Please have a look and give some comments!

Looks good to me, but I'd like to see some comments from some mips/ppc/x86
people on the changes you're making over there.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] kvmtool: Cleanup kernel loading

2015-11-02 Thread Will Deacon
On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote:
> Hi,

Hello Andre,

> this series cleans up kvmtool's kernel loading functionality a bit.
> It has been broken out of a previous series I sent [1] and contains
> just the cleanup and bug fix parts, which should be less controversial
> and thus easier to merge ;-)
> I will resend the pipe loading part later on as a separate series.
> 
> The first patch properly abstracts kernel loading to move
> responsibility into each architecture's code. It removes quite some
> ugly code from the generic kvm.c file.
> The later patches address the naive usage of read(2) to, well, read
> data from files. Doing this without coping with the subtleties of
> the UNIX read semantics (returning with less or none data read is not
> an error) can provoke hard to debug failures.
> So these patches make use of the existing and one new wrapper function
> to make sure we read everything we actually wanted to.
> The last patch moves the ARM kernel loading code into the proper
> location to be in line with the other architectures.
> 
> Please have a look and give some comments!

Looks good to me, but I'd like to see some comments from some mips/ppc/x86
people on the changes you're making over there.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region

2015-11-02 Thread Xiao Guangrong



On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 16:08, Xiao Guangrong wrote:



On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 12:13, Xiao Guangrong wrote:

Curretly, the memory region of backed memory is directly mapped to
guest's address space, however, it is not true for nvdimm device

This patch let dimm device realize this fact and use
DIMMDeviceClass->get_memory_region method to get the mapped memory
region

Current code did not check the return value of get_memory_region as it
assumed the backend memory of pc-dimm is always properly initialized,
we make get_memory_region internally catch the case if something is
wrong


but here you call not pc-dimm's get_memory_region, but common 
ddc->get_memory_region, which may be
nvdimm or possibly other future dimm, so, why not check it here? And than 
pc_dimm_get_memory_region
may be left untouched (error_abort is ok, because errp is unused).


Hmm, because 'here' is not the only place calling ->get_memory_region, this 
method has
multiple callers:

$ git grep "\->get_memory_region"
hw/i386/pc.c:MemoryRegion *mr = ddc->get_memory_region(dimm);
hw/i386/pc.c:MemoryRegion *mr = ddc->get_memory_region(dimm);
hw/mem/dimm.c:mr = ddc->get_memory_region(dimm);
hw/mem/nvdimm.c:ddc->get_memory_region = nvdimm_get_memory_region;
hw/mem/pc-dimm.c:ddc->get_memory_region = pc_dimm_get_memory_region;
hw/ppc/spapr.c:MemoryRegion *mr = ddc->get_memory_region(dimm);

memory region validation is also done for NVDIMM in nvdimm device.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] s390/dma: Allow per device dma ops

2015-11-02 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 02:20:37PM +0100, Christian Borntraeger wrote:
> As virtio-ccw now has dma ops, we can no longer default to the PCI ones.
> Make use of dev_archdata to keep the dma_ops per device. The pci devices
> now use that to override the default, and the default is changed to use
> the noop ops for everything that is not PCI. To compile without PCI
> support we also have to enable the DMA api with virtio.
> 
> Signed-off-by: Christian Borntraeger 
> ---
>  arch/s390/Kconfig   | 3 ++-
>  arch/s390/include/asm/device.h  | 6 +-
>  arch/s390/include/asm/dma-mapping.h | 6 --
>  arch/s390/pci/pci.c | 1 +
>  arch/s390/pci/pci_dma.c | 4 ++--
>  5 files changed, 14 insertions(+), 6 deletions(-)

Reviewed-by: Joerg Roedel 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic

2015-11-02 Thread Paolo Bonzini


On 02/11/2015 15:59, Radim Krcmar wrote:
> > We do not want to do too much work in atomic context, in particular
> > not walking all the VCPUs of the virtual machine.  So we want
> > to distinguish the architecture-specific injection function for irqfd
> > from kvm_set_msi.  Since it's still empty, reuse the newly added
> > kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic.
> 
> kvm/queue uses kvm_arch_set_irq since b7184313f4b9 ("kvm/x86: Hyper-V
> synthetic interrupt controller").
> 
> Is synic going to be dropped before this patch is merged?

Yes.  Both because the Virtuozzo people confirmed that kvm_arch_set_irq
isn't needed for synic, and because synic is currently broken with APICv.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region

2015-11-02 Thread Xiao Guangrong



On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 12:13, Xiao Guangrong wrote:

Curretly, the memory region of backed memory is directly mapped to
guest's address space, however, it is not true for nvdimm device

This patch let dimm device realize this fact and use
DIMMDeviceClass->get_memory_region method to get the mapped memory
region

Current code did not check the return value of get_memory_region as it
assumed the backend memory of pc-dimm is always properly initialized,
we make get_memory_region internally catch the case if something is
wrong

Signed-off-by: Xiao Guangrong 
---
  hw/mem/dimm.c|  3 ++-
  hw/mem/pc-dimm.c | 12 +++-
  2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 4a63409..498d380 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor *v, void 
*opaque,
  int64_t value;
  MemoryRegion *mr;
  DIMMDevice *dimm = DIMM(obj);
+DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
-mr = host_memory_backend_get_memory(dimm->hostmem, errp);
+mr = ddc->get_memory_region(dimm);
  value = memory_region_size(mr);
  visit_type_int(v, , name, errp);
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 38323e9..e6b6a9f 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -22,7 +22,17 @@
  static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
  {
-return host_memory_backend_get_memory(dimm->hostmem, _abort);
+Error *local_err = NULL;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(dimm->hostmem, _err);
+
+/*
+ * plug a pc-dimm device whose backend memory was not properly
+ * initialized?
+ */
+assert(!local_err && mr);
+return mr;
  }


this should squashed into previous patch, I think


You mean merger this patch with 19/35 (dimm: abstract dimm device from pc-dimm)?
The 19/35 mostly ‘moves’ the things, this one changes the core logic, it is not
a big deal. :D




  static void pc_dimm_class_init(ObjectClass *oc, void *data)


I've discovered suddenly, that

MemoryRegion *
host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
{
 return memory_region_size(>mr) ? >mr : NULL;
}

- it doesn't use errp at all. In my opinion, this should be fixed globally, by 
deleting useless
parameter in separate patch. Or just squash your function into previous patch.



Yup, this is a globally interface so i prefer to make a separate patch to do the
cleanup after this patchset.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Provide simple noop dma ops

2015-11-02 Thread Christian Borntraeger
Am 02.11.2015 um 16:16 schrieb Joerg Roedel:
> On Fri, Oct 30, 2015 at 02:20:35PM +0100, Christian Borntraeger wrote:
>> +static void *dma_noop_alloc(struct device *dev, size_t size,
>> +dma_addr_t *dma_handle, gfp_t gfp,
>> +struct dma_attrs *attrs)
>> +{
>> +void *ret;
>> +
>> +ret = (void *)__get_free_pages(gfp, get_order(size));
>> +if (ret) {
>> +memset(ret, 0, size);
> 
> There is no need to zero out the memory here. If the user wants
> initialized memory it can call dma_zalloc_coherent. Having the memset
> here means to clear the memory twice in the dma_zalloc_coherent path.
> 
> Otherwise it looks good.

Thanks. Will fix.
In addition I will also make the compilation of dma-noop.o dependent 
on HAS_DMA.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region

2015-11-02 Thread Vladimir Sementsov-Ogievskiy

On 02.11.2015 18:06, Xiao Guangrong wrote:



On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 16:08, Xiao Guangrong wrote:



On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 12:13, Xiao Guangrong wrote:

Curretly, the memory region of backed memory is directly mapped to
guest's address space, however, it is not true for nvdimm device

This patch let dimm device realize this fact and use
DIMMDeviceClass->get_memory_region method to get the mapped memory
region

Current code did not check the return value of get_memory_region 
as it

assumed the backend memory of pc-dimm is always properly initialized,
we make get_memory_region internally catch the case if something is
wrong


but here you call not pc-dimm's get_memory_region, but common 
ddc->get_memory_region, which may be
nvdimm or possibly other future dimm, so, why not check it here? And 
than pc_dimm_get_memory_region

may be left untouched (error_abort is ok, because errp is unused).


Hmm, because 'here' is not the only place calling ->get_memory_region, 
this method has

multiple callers:

$ git grep "\->get_memory_region"
hw/i386/pc.c:MemoryRegion *mr = ddc->get_memory_region(dimm);
hw/i386/pc.c:MemoryRegion *mr = ddc->get_memory_region(dimm);
hw/mem/dimm.c:mr = ddc->get_memory_region(dimm);
hw/mem/nvdimm.c:ddc->get_memory_region = nvdimm_get_memory_region;
hw/mem/pc-dimm.c:ddc->get_memory_region = pc_dimm_get_memory_region;
hw/ppc/spapr.c:MemoryRegion *mr = ddc->get_memory_region(dimm);

memory region validation is also done for NVDIMM in nvdimm device.

Ok, then it should be documented by a comment in dimm.h, where 
DIMMDeviceClass is defined, that this function should not fail


--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 00/14] ppc64: initial drop

2015-11-02 Thread Thomas Huth
On 03/08/15 16:41, Andrew Jones wrote:
> This series is the first series of a series of series that will
> bring support to kvm-unit-tests for ppc64, and eventually ppc64le.

 Hi Andrew,

may I ask about the current state of ppc64 support in the
kvm-unit-tests? Is there a newer version available than the one you
posted three months ago?

 Thanks,
  Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 00/14] ppc64: initial drop

2015-11-02 Thread Thomas Huth
On 03/08/15 16:41, Andrew Jones wrote:
> This series is the first series of a series of series that will
> bring support to kvm-unit-tests for ppc64, and eventually ppc64le.

 Hi Andrew,

may I ask about the current state of ppc64 support in the
kvm-unit-tests? Is there a newer version available than the one you
posted three months ago?

 Thanks,
  Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v7 03/35] acpi: add aml_create_field

2015-11-02 Thread Shannon Zhao


On 2015/11/2 17:13, Xiao Guangrong wrote:
> Implement CreateField term which is used by NVDIMM _DSM method in later patch
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  hw/acpi/aml-build.c | 13 +
>  include/hw/acpi/aml-build.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index a72214d..9fe5e7b 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1151,6 +1151,19 @@ Aml *aml_sizeof(Aml *arg)
>  return var;
>  }
>  
> +/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateField */
> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
> +{
> +Aml *var = aml_alloc();
> +build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
> +build_append_byte(var->buf, 0x13); /* CreateFieldOp */
> +aml_append(var, srcbuf);
> +aml_append(var, index);
> +aml_append(var, len);
> +build_append_namestring(var->buf, "%s", name);
> +return var;
> +}
> +
>  void
>  build_header(GArray *linker, GArray *table_data,
>   AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 7296efb..7e1c43b 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -276,6 +276,7 @@ Aml *aml_touuid(const char *uuid);
>  Aml *aml_unicode(const char *str);
>  Aml *aml_derefof(Aml *arg);
>  Aml *aml_sizeof(Aml *arg);
> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
>  
Maybe this could be moved together with existing aml_create_dword_field.

>  void
>  build_header(GArray *linker, GArray *table_data,
> 

-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net

2015-11-02 Thread Jason Wang


On 10/30/2015 07:58 PM, Jason Wang wrote:
>
> On 10/29/2015 04:45 PM, Jason Wang wrote:
>> Hi all:
>>
>> This series tries to add basic busy polling for vhost net. The idea is
>> simple: at the end of tx processing, busy polling for new tx added
>> descriptor and rx receive socket for a while. The maximum number of
>> time (in us) could be spent on busy polling was specified through
>> module parameter.
>>
>> Test were done through:
>>
>> - 50 us as busy loop timeout
>> - Netperf 2.6
>> - Two machines with back to back connected mlx4
>> - Guest with 8 vcpus and 1 queue
>>
>> Result shows very huge improvement on both tx (at most 158%) and rr
>> (at most 53%) while rx is as much as in the past. Most cases the cpu
>> utilization is also improved:
>>
> Just notice there's something wrong in the setup. So the numbers are
> incorrect here. Will re-run and post correct number here.
>
> Sorry.

Here's the updated testing result:

1) 1 vcpu 1 queue:

TCP_RR
size/session/+thu%/+normalize%
1/ 1/0%/  -25%
1/50/  +12%/0%
1/   100/  +12%/   +1%
1/   200/   +9%/   -1%
   64/ 1/   +3%/  -21%
   64/50/   +8%/0%
   64/   100/   +7%/0%
   64/   200/   +9%/0%
  256/ 1/   +1%/  -25%
  256/50/   +7%/   -2%
  256/   100/   +6%/   -2%
  256/   200/   +4%/   -2%
  512/ 1/   +2%/  -19%
  512/50/   +5%/   -2%
  512/   100/   +3%/   -3%
  512/   200/   +6%/   -2%
 1024/ 1/   +2%/  -20%
 1024/50/   +3%/   -3%
 1024/   100/   +5%/   -3%
 1024/   200/   +4%/   -2%
Guest RX
size/session/+thu%/+normalize%
   64/ 1/   -4%/   -5%
   64/ 4/   -3%/  -10%
   64/ 8/   -3%/   -5%
  512/ 1/  +15%/   +1%
  512/ 4/   -5%/   -5%
  512/ 8/   -2%/   -4%
 1024/ 1/   -5%/  -16%
 1024/ 4/   -2%/   -5%
 1024/ 8/   -6%/   -6%
 2048/ 1/  +10%/   +5%
 2048/ 4/   -8%/   -4%
 2048/ 8/   -1%/   -4%
 4096/ 1/   -9%/  -11%
 4096/ 4/   +1%/   -1%
 4096/ 8/   +1%/0%
16384/ 1/  +20%/  +11%
16384/ 4/0%/   -3%
16384/ 8/   +1%/0%
65535/ 1/  +36%/  +13%
65535/ 4/  -10%/   -9%
65535/ 8/   -3%/   -2%
Guest TX
size/session/+thu%/+normalize%
   64/ 1/   -7%/  -16%
   64/ 4/  -14%/  -23%
   64/ 8/   -9%/  -20%
  512/ 1/  -62%/  -56%
  512/ 4/  -62%/  -56%
  512/ 8/  -61%/  -53%
 1024/ 1/  -66%/  -61%
 1024/ 4/  -77%/  -73%
 1024/ 8/  -73%/  -67%
 2048/ 1/  -74%/  -75%
 2048/ 4/  -77%/  -74%
 2048/ 8/  -72%/  -68%
 4096/ 1/  -65%/  -68%
 4096/ 4/  -66%/  -63%
 4096/ 8/  -62%/  -57%
16384/ 1/  -25%/  -28%
16384/ 4/  -28%/  -17%
16384/ 8/  -24%/  -10%
65535/ 1/  -17%/  -14%
65535/ 4/  -22%/   -5%
65535/ 8/  -25%/   -9%

- obvious improvement on TCP_RR (at most 12%)
- improvement on guest RX
- huge decreasing on Guest TX (at most -75%), this is probably because
virtio-net driver suffers from buffer bloat by orphaning skb before
transmission. The faster vhost it is, the smaller packet it could
produced. To reduce the impact on this, turning off gso in guest can
result the following result:

size/session/+thu%/+normalize%
   64/ 1/   +3%/  -11%
   64/ 4/   +4%/  -10%
   64/ 8/   +4%/  -10%
  512/ 1/   +2%/   +5%
  512/ 4/0%/   -1%
  512/ 8/0%/0%
 1024/ 1/  +11%/0%
 1024/ 4/0%/   -1%
 1024/ 8/   +3%/   +1%
 2048/ 1/   +4%/   -1%
 2048/ 4/   +8%/   +3%
 2048/ 8/0%/   -1%
 4096/ 1/   +4%/   -1%
 4096/ 4/   +1%/0%
 4096/ 8/   +2%/0%
16384/ 1/   +2%/   -2%
16384/ 4/   +3%/   +1%
16384/ 8/0%/   -1%
65535/ 1/   +9%/   +7%
65535/ 4/0%/   -3%
65535/ 8/   -1%/   -1%

2) 8 vcpus 1 queue:

TCP_RR
size/session/+thu%/+normalize%
1/ 1/   +5%/  -14%
1/50/   +2%/   +1%
1/   100/0%/   -1%
1/   200/0%/0%
   64/ 1/0%/  -25%
   64/50/   +5%/   +5%
   64/   100/0%/0%
   64/   200/0%/   -1%
  256/ 1/0%/  -30%
  256/50/0%/0%
  256/   100/   -2%/   -2%
  256/   200/0%/0%
  512/ 1/   +1%/  -23%
  512/50/   +1%/   +1%
  512/   100/   +1%/0%
  512/   200/   +1%/   +1%
 1024/ 1/   +1%/  -23%
 1024/50/   +5%/   +5%
 1024/   100/0%/   -1%
 1024/   200/0%/0%
Guest RX
size/session/+thu%/+normalize%
   64/ 1/   +1%/   +1%
   64/ 4/   -2%/   +1%
   64/ 8/   +6%/  +19%
  512/ 1/   +5%/   -7%
  512/ 4/   -4%/   -4%
  512/ 8/0%/0%
 1024/ 1/   +1%/   +2%
 1024/ 4/   -2%/   -2%
 1024/ 8/   -1%/   +7%
 2048/ 1/   +8%/   -2%
 2048/ 4/0%/   +5%
 2048/ 8/   -1%/  +13%
 4096/ 1/   -1%/   +2%
 4096/ 4/0%/   +6%
 4096/ 8/   -2%/  +15%
16384/ 1/   -1%/0%
16384/ 4/   -2%/   -1%
16384/ 8/   -2%/   +2%
65535/ 1/   -2%/0%
65535/ 4/   -3%/   -3%
65535/ 8/   -2%/   +2%
Guest TX
size/session/+thu%/+normalize%
   64/ 1/   +6%/   

RE: [PATCH v4 0/3] KVM: arm/arm64: Clean up some obsolete code

2015-11-02 Thread Pavel Fedin
 Hello!

> I ran this through my test scripts and I'm now quite sure that there's
> some breakage in here.
> 
> One of my tests is running two VMs in parallel, each booting up, running
> hackbench, and then doing reboot (from within the guest), and just
> repeating like that.
> 
> I've run your patches in the above config 100 times, and every time,
> the rebooting VMs got stuck before 50 reboots.
> 
> Without these patches, I could run the above config 100 times, and every
> time, the rebooting VMs passed 200 reboots.

 Huh, the description looks like some problem with vgic_retire_disabled_irqs(). 
By the way, during reboot, who does call it? The
only call i see is in vgic_handle_enable_reg(), which obviously just processes 
emulated register accesses...
 And the only thing i know is that in case of GICv2 the userland resets vGIC 
manually by resetting each register to its default
value (therefore all ENABLER are set to 0). At least qemu does this, and i'm 
not sure about kvmtool. And in case of vGICv3 nobody
can do this because there's no API to set registers yet. So, could we be 
rebooting with interrupts enabled or something like that?
 So: what kind of container are you running and what vGIC version? Does this 
problem reproduce with both vGICv2 and vGICv3?

 By this time i'll make a very minimal version of patch 0001, for you to test 
it. If we have problems with current 0001, which we
cannot solve quickly, we could stick to that version then, which will provide 
the necessary changes to plug in LPIs, yet with
minimal changes (it will only remove vgic_irq_lr_map).
 I guess i should have done it before. Or, i could even respin v5, with current 
0001 split up. This should make it easier to bisect
the problem.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 05/21] KVM: ARM64: Add reset and access handlers for PMSELR register

2015-11-02 Thread Christopher Covington
On 10/30/2015 02:21 AM, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Since the reset value of PMSELR_EL0 is UNKNOWN, use reset_unknown for
> its reset handler. As it doesn't need to deal with the acsessing action

Nit: accessing

> specially, it uses default case to emulate writing and reading PMSELR
> register.
> 
> Add a helper for CP15 registers reset to UNKNOWN.
> 
> Signed-off-by: Shannon Zhao 

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: move is_le setup to the backend

2015-11-02 Thread David Miller
From: Greg Kurz 
Date: Fri, 30 Oct 2015 12:42:35 +0100

> The vq->is_le field is used to fix endianness when accessing the vring via
> the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases:
> 
> 1) host is big endian and device is modern virtio
> 
> 2) host has cross-endian support and device is legacy virtio with a different
>endianness than the host
> 
> Both cases rely on the VHOST_SET_FEATURES ioctl, but 2) also needs the
> VHOST_SET_VRING_ENDIAN ioctl to be called by userspace. Since vq->is_le
> is only needed when the backend is active, it was decided to set it at
> backend start.
> 
> This is currently done in vhost_init_used()->vhost_init_is_le() but it
> obfuscates the core vhost code. This patch moves the is_le setup to a
> dedicated function that is called from the backend code.
> 
> Note vhost_net is the only backend that can pass vq->private_data == NULL to
> vhost_init_used(), hence the "if (sock)" branch.
> 
> No behaviour change.
> 
> Signed-off-by: Greg Kurz 

Michael, I'm assuming that you will be the one taking this.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Benjamin Herrenschmidt
On Mon, 2015-11-02 at 14:07 +0200, Shamir Rabinovitch wrote:
> On Mon, Nov 02, 2015 at 09:00:34PM +1100, Benjamin Herrenschmidt
> wrote:
> > 
> > Chosing on a per-mapping basis *in the back end* might still make
> > some
> 
> In my case, choosing mapping based on the hardware that will use this
> mappings makes more sense. Most hardware are not that performance 
> sensitive as the Infiniband hardware.

 ...

> The driver know for what hardware it is mapping the memory so it know 
> if the memory will be used by performance sensitive hardware or not.

Then I would argue for naming this differently. Make it an optional
hint "DMA_ATTR_HIGH_PERF" or something like that. Whether this is
achieved via using a bypass or other means in the backend not the
business of the driver.

> In your case, what will give the better performance - 1:1 mapping or
> IOMMU
> mapping? When you say 'relaxing the protection' you refer to 1:1
> mapping?

> Also, how this 1:1 window address the security concerns that other
> raised
> by other here?

It will partially only but it's just an example of another way the
bakcend could provide some improved performances without a bypass.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 07/21] KVM: ARM64: PMU: Add perf event map and introduce perf event creating function

2015-11-02 Thread Christopher Covington
On 10/30/2015 02:21 AM, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> When we use tools like perf on host, perf passes the event type and the
> id of this event type category to kernel, then kernel will map them to
> hardware event number and write this number to PMU PMEVTYPER_EL0
> register. When getting the event number in KVM, directly use raw event
> type to create a perf_event for it.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/include/asm/pmu.h |   2 +
>  arch/arm64/kvm/Makefile  |   1 +
>  include/kvm/arm_pmu.h|  13 +
>  virt/kvm/arm/pmu.c   | 117 
> +++
>  4 files changed, 133 insertions(+)
>  create mode 100644 virt/kvm/arm/pmu.c
> 
> diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
> index b9f394a..2c025f2 100644
> --- a/arch/arm64/include/asm/pmu.h
> +++ b/arch/arm64/include/asm/pmu.h
> @@ -31,6 +31,8 @@
>  #define ARMV8_PMCR_D (1 << 3) /* CCNT counts every 64th cpu cycle */
>  #define ARMV8_PMCR_X (1 << 4) /* Export to ETM */
>  #define ARMV8_PMCR_DP(1 << 5) /* Disable CCNT if 
> non-invasive debug*/
> +/* Determines which PMCCNTR_EL0 bit generates an overflow */
> +#define ARMV8_PMCR_LC(1 << 6)
>  #define  ARMV8_PMCR_N_SHIFT  11   /* Number of counters 
> supported */
>  #define  ARMV8_PMCR_N_MASK   0x1f
>  #define  ARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 1949fe5..18d56d8 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -27,3 +27,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v3-switch.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
> +kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 254d2b4..1908c88 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -38,4 +38,17 @@ struct kvm_pmu {
>  #endif
>  };
>  
> +#ifdef CONFIG_KVM_ARM_PMU
> +unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 
> select_idx);
> +void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
> + u32 select_idx);
> +#else
> +unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 
> select_idx)
> +{
> + return 0;
> +}
> +void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
> + u32 select_idx) {}
> +#endif
> +
>  #endif
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> new file mode 100644
> index 000..900a64c
> --- /dev/null
> +++ b/virt/kvm/arm/pmu.c
> @@ -0,0 +1,117 @@
> +/*
> + * Copyright (C) 2015 Linaro Ltd.
> + * Author: Shannon Zhao 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/**
> + * kvm_pmu_get_counter_value - get PMU counter value
> + * @vcpu: The vcpu pointer
> + * @select_idx: The counter index
> + */
> +unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 
> select_idx)
> +{
> + u64 counter, enabled, running;
> + struct kvm_pmu *pmu = >arch.pmu;
> + struct kvm_pmc *pmc = >pmc[select_idx];
> +
> + if (!vcpu_mode_is_32bit(vcpu))
> + counter = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + select_idx);
> + else
> + counter = vcpu_cp15(vcpu, c14_PMEVCNTR0 + select_idx);
> +
> + if (pmc->perf_event)
> + counter += perf_event_read_value(pmc->perf_event, ,
> +  );
> +
> + return counter & pmc->bitmask;
> +}
> +
> +/**
> + * kvm_pmu_stop_counter - stop PMU counter
> + * @pmc: The PMU counter pointer
> + *
> + * If this counter has been configured to monitor some event, release it 
> here.
> + */
> +static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
> +{
> + struct kvm_vcpu *vcpu = pmc->vcpu;
> + u64 counter;
> +
> + if (pmc->perf_event) {
> + counter = kvm_pmu_get_counter_value(vcpu, pmc->idx);
> + if (!vcpu_mode_is_32bit(vcpu))
> + vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + pmc->idx) = counter;
> + 

Re: [PATCH/RFC 0/4] dma ops and virtio

2015-11-02 Thread Andy Lutomirski
On Mon, Nov 2, 2015 at 3:16 AM, Cornelia Huck  wrote:
> On Fri, 30 Oct 2015 13:33:07 -0700
> Andy Lutomirski  wrote:
>
>> On Fri, Oct 30, 2015 at 1:25 AM, Cornelia Huck  
>> wrote:
>> > On Thu, 29 Oct 2015 15:50:38 -0700
>> > Andy Lutomirski  wrote:
>> >
>> >> Progress!  After getting that sort-of-working, I figured out what was
>> >> wrong with my earlier command, and I got that working, too.  Now I
>> >> get:
>> >>
>> >> qemu-system-s390x -fsdev
>> >> local,id=virtfs1,path=/,security_model=none,readonly -device
>> >> virtio-9p-ccw,fsdev=virtfs1,mount_tag=/dev/root -M s390-ccw-virtio
>> >> -nodefaults -device sclpconsole,chardev=console -parallel none -net
>> >> none -echr 1 -serial none -chardev stdio,id=console,signal=off,mux=on
>> >> -serial chardev:console -mon chardev=console -vga none -display none
>> >> -kernel arch/s390/boot/bzImage -append
>> >> 'init=/home/luto/devel/virtme/virtme/guest/virtme-init
>> >> psmouse.proto=exps "virtme_stty_con=rows 24 cols 150 iutf8"
>> >> TERM=xterm-256color rootfstype=9p
>> >> rootflags=ro,version=9p2000.L,trans=virtio,access=any
>> >> raid=noautodetect debug'
>> >
>> > The commandline looks sane AFAICS.
>> >
>> > (...)
>> >
>> >> vrfy: device 0.0.: rc=0 pgroup=0 mpath=0 vpm=80
>> >> virtio_ccw 0.0.: Failed to set online: -5
>> >>
>> >> ^^^ bad news!
>> >
>> > I'd like to see where in the onlining process this fails. Could you set
>> > up qemu tracing for css_* and virtio_ccw_* (instructions in
>> > qemu/docs/tracing.txt)?
>>
>> I have a file called events that contains:
>>
>> css_*
>> virtio_ccw_*
>>
>> pointing -trace events= at it results in a trace- file that's 549
>> bytes long and contains nothing.  Are wildcards not as well-supported
>> as the docs suggest?
>
> Just tried it, seemed to work for me as expected. And as your messages
> indicate, at least some of the css tracepoints are guaranteed to be
> hit. Odd.
>
> Can you try the following sophisticated printf debug patch?
>
> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> index c033612..6a87bd6 100644
> --- a/hw/s390x/css.c
> +++ b/hw/s390x/css.c
> @@ -308,6 +308,8 @@ static int css_interpret_ccw(SubchDev *sch, hwaddr 
> ccw_addr)
>  sch->ccw_no_data_cnt++;
>  }
>
> +fprintf(stderr, "CH DBG: %s: cmd_code=%x\n", __func__, ccw.cmd_code);
> +
>  /* Look at the command. */
>  switch (ccw.cmd_code) {
>  case CCW_CMD_NOOP:
> @@ -375,6 +377,7 @@ static int css_interpret_ccw(SubchDev *sch, hwaddr 
> ccw_addr)
>  }
>  break;
>  }
> +fprintf(stderr, "CH DBG: %s: ret=%d\n", __func__, ret);
>  sch->last_cmd = ccw;
>  sch->last_cmd_valid = true;
>  if (ret == 0) {
>
>
>> > Which qemu version is this, btw.?
>> >
>>
>> git from yesterday.
>
> Hm. Might be worth trying the s390-ccw-virtio-2.4 machine instead.
>

No change.

With s390-ccw-virtio-2.4, I get:

Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Initializing cgroup subsys cpuacct
Linux version 4.3.0-rc7-8-gff230d6ec6b2
(l...@amaluto.corp.amacapital.net) (gcc version 5.1.1 20150618 (Red
Hat Cross 5.1.1-3) (GCC) ) #344 SMP Fri Oct 30 13:16:13 PDT 2015
setup: Linux is running under KVM in 64-bit mode
setup: Max memory size: 128MB
Zone ranges:
  DMA  [mem 0x-0x7fff]
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x-0x07ff]
Initmem setup node 0 [mem 0x-0x07ff]
On node 0 totalpages: 32768
  DMA zone: 512 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 32768 pages, LIFO batch:7
PERCPU: Embedded 466 pages/cpu @07605000 s1868032 r8192 d32512 u1908736
pcpu-alloc: s1868032 r8192 d32512 u1908736 alloc=466*4096
pcpu-alloc: [0] 0 [0] 1
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32256
Kernel command line:
init=/home/luto/devel/virtme/virtme/guest/virtme-init
psmouse.proto=exps "virtme_stty_con=rows 45 cols 150 iutf8"
TERM=xterm-256color rootfstype=9p
rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect
ro debug
PID hash table entries: 512 (order: 0, 4096 bytes)
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
Memory: 92520K/131072K available (8255K kernel code, 802K rwdata,
3860K rodata, 2384K init, 14382K bss, 38552K reserved, 0K
cma-reserved)
Write protected kernel read-only data: 0x10 - 0xcd4fff
SLUB: HWalign=256, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
Running RCU self tests
Hierarchical RCU implementation.
RCU debugfs-based tracing is enabled.
RCU lockdep checking is enabled.
Build-time adjustment of leaf fanout to 64.
RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=2.
RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=2
NR_IRQS:3
clocksource: tod: mask: 0x max_cycles:

[PATCH] KVM: VMX: Fix commit which broke PML

2015-11-02 Thread Kai Huang
I found PML was broken since below commit:

commit feda805fe7c4ed9cf78158e73b1218752e3b4314
Author: Xiao Guangrong 
Date:   Wed Sep 9 14:05:55 2015 +0800

KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update

Unify the update in vmx_cpuid_update()

Signed-off-by: Xiao Guangrong 
[Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
Signed-off-by: Paolo Bonzini 

The reason is PML after above commit vmx_cpuid_update calls
vmx_secondary_exec_control, in which PML is disabled unconditionally, as PML is
enabled in creating vcpu. Therefore if vcpu_cpuid_update is called after vcpu is
created, PML will be disabled unexpectedly while log-dirty code still think PML
is used. Actually looks calling vmx_secondary_exec_control in vmx_cpuid_update
is likely to break any VMX features that is enabled/disabled on demand by
updating SECONDARY_VM_EXEC_CONTROL, if vmx_cpuid_update is called between the
feature is enabled and disabled.

Fix this by calling vmcs_read32 to read out SECONDARY_VM_EXEC_CONTROL directly.

Signed-off-by: Kai Huang 
---
 arch/x86/kvm/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4d0aa31..4525c0a7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8902,7 +8902,7 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   u32 secondary_exec_ctl = vmx_secondary_exec_control(vmx);
+   u32 secondary_exec_ctl = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
 
if (vmx_rdtscp_supported()) {
bool rdtscp_enabled = guest_cpuid_has_rdtscp(vcpu);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 13/35] hostmem-file: use whole file size if possible

2015-11-02 Thread Vladimir Sementsov-Ogievskiy

On 02.11.2015 12:13, Xiao Guangrong wrote:

Use the whole file size if @size is not specified which is useful
if we want to directly pass a file to guest

Signed-off-by: Xiao Guangrong 
---
  backends/hostmem-file.c | 22 ++
  1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 9097a57..ea355c1 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -38,15 +38,29 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
  {
  HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
  
-if (!backend->size) {

-error_setg(errp, "can't create backend with size 0");
-return;
-}
  if (!fb->mem_path) {
  error_setg(errp, "mem-path property not set");
  return;
  }
  
+if (!backend->size) {

+Error *local_err = NULL;
+
+/*
+ * use the whole file size if @size is not specified.
+ */
+backend->size = qemu_file_getlength(fb->mem_path, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+
+if (!backend->size) {
+error_setg(errp, "can't create backend on the file whose size is 0");
+return;
+}
+
  backend->force_prealloc = mem_prealloc;
  memory_region_init_ram_from_file(>mr, OBJECT(backend),
   object_get_canonical_path(OBJECT(backend)),


why not just

+if (!backend->size) {
+/*
+ * use the whole file size if @size is not specified.
+ */
+backend->size = qemu_file_getlength(fb->mem_path, errp);
+if (*errp) {
+return;
+}
+}


what the purpose of propagating?

--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Shamir Rabinovitch
On Mon, Nov 02, 2015 at 03:44:27PM +0100, Joerg Roedel wrote:
> 
> How do you envision this per-mapping by-pass to work? For the DMA-API
> mappings you have only one device address space. For by-pass to work,
> you need to map all physical memory of the machine into this space,
> which leaves the question where you want to put the window for
> remapping.
> 
> You surely have to put it after the physical mappings, but any
> protection will be gone, as the device can access all physical memory.

Correct. This issue is one of the concerns here in the previous replies.
I will take different approach which will not require the IOMMU bypass
per mapping. Will try to shift to the x86 'iommu=pt' approach.

> 
> So instead of working around the shortcomings of DMA-API
> implementations, can you present us some numbers and analysis of how bad
> the performance impact with an IOMMU is and what causes it?

We had a bunch of issues around SPARC IOMMU. Not all of them relate to
performance. The first issue was that on SPARC, currently, we only have 
limited address space to IOMMU so we had issue to do large DMA mappings
for Infiniband. Second issue was that we identified high contention on 
the IOMMU locks even in ETH driver. 

> 
> I know that we have lock-contention issues in our IOMMU drivers, which
> can be fixed. Maybe the performance problems you are seeing can be fixed
> too, when you give us more details about them.
> 

I do not want to put too much information here but you can see some results:

rds-stress test from sparc t5-2 -> x86:

with iommu bypass:
-
sparc->x86 cmdline = -r XXX -s XXX -q 256 -a 8192 -T 10 -d 10 -t 3 -o XXX
tsks   tx/s   rx/s  tx+rx K/smbi K/smbo K/s tx us/c   rtt us cpu %
   3 141278  0 1165565.81   0.00   0.008.93   376.60 -1.00  
(average)

without iommu bypass:
-
sparc->x86 cmdline = -r XXX -s XXX -q 256 -a 8192 -T 10 -d 10 -t 3 -o XXX
tsks   tx/s   rx/s  tx+rx K/smbi K/smbo K/s tx us/c   rtt us cpu %
   3  78558  0  648101.41   0.00   0.00   15.05   876.72 -1.00  
(average)

+ RDMA tests are totally not working (might be due to failure to DMA map all 
the memory).

So IOMMU bypass give ~80% performance boost.

> 
>   Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [kvm-unit-tests PATCHv5 3/3] arm: pmu: Add CPI checking

2015-11-02 Thread Andrew Jones
On Fri, Oct 30, 2015 at 03:32:43PM -0400, Christopher Covington wrote:
> Hi Drew,
> 
> On 10/30/2015 09:00 AM, Andrew Jones wrote:
> > On Wed, Oct 28, 2015 at 03:12:55PM -0400, Christopher Covington wrote:
> >> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> >> PMU cycle counter values. The code includes a strict checking facility
> >> intended for the -icount option in TCG mode but it is not yet enabled
> >> in the configuration file. Enabling it must wait on infrastructure
> >> improvements which allow for different tests to be run on TCG versus
> >> KVM.
> >>
> >> Signed-off-by: Christopher Covington 
> >> ---
> >>  arm/pmu.c | 103 
> >> +-
> >>  1 file changed, 102 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arm/pmu.c b/arm/pmu.c
> >> index 4334de4..76a 100644
> >> --- a/arm/pmu.c
> >> +++ b/arm/pmu.c
> >> @@ -43,6 +43,23 @@ static inline unsigned long get_pmccntr(void)
> >>asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
> >>return cycles;
> >>  }
> >> +
> >> +/*
> >> + * Extra instructions inserted by the compiler would be difficult to 
> >> compensate
> >> + * for, so hand assemble everything between, and including, the PMCR 
> >> accesses
> >> + * to start and stop counting.
> >> + */
> >> +static inline void loop(int i, uint32_t pmcr)
> >> +{
> >> +  asm volatile(
> >> +  "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
> >> +  "1: subs%[i], %[i], #1\n"
> >> +  "   bgt 1b\n"
> >> +  "   mcr p15, 0, %[z], c9, c12, 0\n"
> >> +  : [i] "+r" (i)
> >> +  : [pmcr] "r" (pmcr), [z] "r" (0)
> >> +  : "cc");
> >> +}
> >>  #elif defined(__aarch64__)
> >>  static inline uint32_t get_pmcr(void)
> >>  {
> >> @@ -64,6 +81,23 @@ static inline unsigned long get_pmccntr(void)
> >>asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> >>return cycles;
> >>  }
> >> +
> >> +/*
> >> + * Extra instructions inserted by the compiler would be difficult to 
> >> compensate
> >> + * for, so hand assemble everything between, and including, the PMCR 
> >> accesses
> >> + * to start and stop counting.
> >> + */
> >> +static inline void loop(int i, uint32_t pmcr)
> >> +{
> >> +  asm volatile(
> >> +  "   msr pmcr_el0, %[pmcr]\n"
> >> +  "1: subs%[i], %[i], #1\n"
> >> +  "   b.gt1b\n"
> >> +  "   msr pmcr_el0, xzr\n"
> >> +  : [i] "+r" (i)
> >> +  : [pmcr] "r" (pmcr)
> >> +  : "cc");
> >> +}
> >>  #endif
> >>  
> >>  struct pmu_data {
> >> @@ -131,12 +165,79 @@ static bool check_cycles_increase(void)
> >>return true;
> >>  }
> >>  
> >> -int main(void)
> >> +/*
> >> + * Execute a known number of guest instructions. Only odd instruction 
> >> counts
> >> + * greater than or equal to 3 are supported by the in-line assembly code. 
> >> The
> >> + * control register (PMCR_EL0) is initialized with the provided value 
> >> (allowing
> >> + * for example for the cycle counter or event counters to be reset). At 
> >> the end
> >> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> >> + * counting, allowing the cycle counter or event counters to be read at 
> >> the
> >> + * leisure of the calling code.
> >> + */
> >> +static void measure_instrs(int num, uint32_t pmcr)
> >> +{
> >> +  int i = (num - 1) / 2;
> >> +
> >> +  assert(num >= 3 && ((num - 1) % 2 == 0));
> >> +  loop(i, pmcr);
> >> +}
> >> +
> >> +/*
> >> + * Measure cycle counts for various known instruction counts. Ensure that 
> >> the
> >> + * cycle counter progresses (similar to check_cycles_increase() but with 
> >> more
> >> + * instructions and using reset and stop controls). If supplied a 
> >> positive,
> >> + * nonzero CPI parameter, also strictly check that every measurement 
> >> matches
> >> + * it. Strict CPI checking is used to test -icount mode.
> >> + */
> >> +static bool check_cpi(int cpi)
> >> +{
> >> +  struct pmu_data pmu = {0};
> >> +
> >> +  pmu.cycle_counter_reset = 1;
> >> +  pmu.enable = 1;
> >> +
> >> +  if (cpi > 0)
> >> +  printf("Checking for CPI=%d.\n", cpi);
> >> +  printf("instrs : cycles0 cycles1 ...\n");
> >> +
> >> +  for (int i = 3; i < 300; i += 32) {
> >> +  int avg, sum = 0;
> >> +
> >> +  printf("%d :", i);
> >> +  for (int j = 0; j < NR_SAMPLES; j++) {
> >> +  int cycles;
> >> +
> >> +  measure_instrs(i, pmu.pmcr_el0);
> >> +  cycles = get_pmccntr();
> >> +  printf(" %d", cycles);
> >> +
> >> +  if (!cycles || (cpi > 0 && cycles != i * cpi)) {
> >> +  printf("\n");
> >> +  return false;
> >> +  }
> >> +
> >> +  sum += cycles;
> >> +  }
> >> +  avg = sum / NR_SAMPLES;
> >> +  printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
> >> +  sum, avg, i / avg, avg / i);
> >> +  }
> >> +
> >> +  

Re: [PATCH 3/3] s390/dma: Allow per device dma ops

2015-11-02 Thread Sebastian Ott
On Fri, 30 Oct 2015, Christian Borntraeger wrote:

> As virtio-ccw now has dma ops, we can no longer default to the PCI ones.
> Make use of dev_archdata to keep the dma_ops per device. The pci devices
> now use that to override the default, and the default is changed to use
> the noop ops for everything that is not PCI. To compile without PCI
> support we also have to enable the DMA api with virtio.
> 
> Signed-off-by: Christian Borntraeger 

Acked-by: Sebastian Ott 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file

2015-11-02 Thread Vladimir Sementsov-Ogievskiy

On 02.11.2015 18:25, Xiao Guangrong wrote:



On 11/02/2015 11:12 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 12:13, Xiao Guangrong wrote:

Currently, file_ram_alloc() only works on directory - it creates a file
under @path and do mmap on it

This patch tries to allow it to work on file directly, if @path is a


It isn't try to allow, it allows, as I understand)...


Err... Sorry for my English, but what is the different between:
”This patch tries to allow it to work on file directly“ and
"This patch allows it to work on file directly"

:(


Not sure that everyone is interested in this nit-picking discussion..

A allows B: if A then B
A tries to allow B: if A then _may be_ B

In any way it doesn't matter.






directory it works as before, otherwise it treats @path as the target
file then directly allocate memory from it

Signed-off-by: Xiao Guangrong 
---
  exec.c | 80 
++

  1 file changed, 51 insertions(+), 29 deletions(-)

diff --git a/exec.c b/exec.c
index 9075f4d..db0fdaf 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
  }
  #ifdef __linux__
+static bool path_is_dir(const char *path)
+{
+struct stat fs;
+
+return stat(path, ) == 0 && S_ISDIR(fs.st_mode);
+}
+
+static int open_ram_file_path(RAMBlock *block, const char *path, 
size_t size)

+{
+char *filename;
+char *sanitized_name;
+char *c;
+int fd;
+
+if (!path_is_dir(path)) {
+int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
+
+flags |= O_EXCL;
+return open(path, flags);
+}
+
+/* Make name safe to use with mkstemp by replacing '/' with 
'_'. */

+sanitized_name = g_strdup(memory_region_name(block->mr));
+for (c = sanitized_name; *c != '\0'; c++) {
+if (*c == '/') {
+*c = '_';
+}
+}
+filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
+   sanitized_name);
+g_free(sanitized_name);


one empty line will be very nice here, and it was in master branch


+fd = mkstemp(filename);
+if (fd >= 0) {
+unlink(filename);
+/*
+ * ftruncate is not supported by hugetlbfs in older
+ * hosts, so don't bother bailing out on errors.
+ * If anything goes wrong with it under other filesystems,
+ * mmap will fail.
+ */
+if (ftruncate(fd, size)) {
+perror("ftruncate");
+}
+}
+g_free(filename);
+
+return fd;
+}
+
  static void *file_ram_alloc(RAMBlock *block,
  ram_addr_t memory,
  const char *path,
  Error **errp)
  {
-char *filename;
-char *sanitized_name;
-char *c;
  void *area;
  int fd;
  uint64_t pagesize;
@@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
  goto error;
  }
-/* Make name safe to use with mkstemp by replacing '/' with 
'_'. */

-sanitized_name = g_strdup(memory_region_name(block->mr));
-for (c = sanitized_name; *c != '\0'; c++) {
-if (*c == '/')
-*c = '_';
-}
-
-filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
-   sanitized_name);
-g_free(sanitized_name);
+memory = ROUND_UP(memory, pagesize);
-fd = mkstemp(filename);
+fd = open_ram_file_path(block, path, memory);
  if (fd < 0) {
  error_setg_errno(errp, errno,
   "unable to create backing store for path 
%s", path);

-g_free(filename);
  goto error;
  }
-unlink(filename);
-g_free(filename);
-
-memory = ROUND_UP(memory, pagesize);
-
-/*
- * ftruncate is not supported by hugetlbfs in older
- * hosts, so don't bother bailing out on errors.
- * If anything goes wrong with it under other filesystems,
- * mmap will fail.
- */
-if (ftruncate(fd, memory)) {
-perror("ftruncate");
-}
  area = qemu_ram_mmap(fd, memory, pagesize, block->flags & 
RAM_SHARED);

  if (area == MAP_FAILED) {



Reviewed-by: Vladimir Sementsov-Ogievskiy 


Thanks for your review.





--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic

2015-11-02 Thread Paolo Bonzini


On 02/11/2015 18:01, Radim Krcmar wrote:
>> > Yes.  Both because the Virtuozzo people confirmed that kvm_arch_set_irq
>> > isn't needed for synic, and because synic is currently broken with APICv.
> Thanks.
> 
> (We can add direct delivery for |online vcpus| < X if performance with
>  low number of VCPUs happens to regress because of the schedule_work,)

Yeah, we will change the VFIO interrupt handler to non-threaded, which
will do more or less the same (what you lose now through schedule_work,
you will recoup by doing things directly in the ISR).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 102651] vcpuX unhandled rdmsr: 0x570

2015-11-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=102651

Joachim Namislow  changed:

   What|Removed |Added

 CC||jfrie...@hotmail.com

--- Comment #3 from Joachim Namislow  ---
I do see this warning by kernel-4.2.3-300.fc23 of a Fedora 23 system.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] kvmtool: Cleanup kernel loading

2015-11-02 Thread Andre Przywara
Hi Dimitri,

On 02/11/15 15:17, Dimitri John Ledkov wrote:
> On 2 November 2015 at 14:58, Will Deacon  wrote:
>> On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote:
>>> Hi,
>>
>> Hello Andre,
>>
>>> this series cleans up kvmtool's kernel loading functionality a bit.
>>> It has been broken out of a previous series I sent [1] and contains
>>> just the cleanup and bug fix parts, which should be less controversial
>>> and thus easier to merge ;-)
>>> I will resend the pipe loading part later on as a separate series.
>>>
>>> The first patch properly abstracts kernel loading to move
>>> responsibility into each architecture's code. It removes quite some
>>> ugly code from the generic kvm.c file.
>>> The later patches address the naive usage of read(2) to, well, read
>>> data from files. Doing this without coping with the subtleties of
>>> the UNIX read semantics (returning with less or none data read is not
>>> an error) can provoke hard to debug failures.
>>> So these patches make use of the existing and one new wrapper function
>>> to make sure we read everything we actually wanted to.
>>> The last patch moves the ARM kernel loading code into the proper
>>> location to be in line with the other architectures.
>>>
>>> Please have a look and give some comments!
>>
>> Looks good to me, but I'd like to see some comments from some mips/ppc/x86
>> people on the changes you're making over there.
> 
> Looks mostly good to me, as one of the kvmtool down streams. Over at
> https://github.com/clearlinux/kvmtool we have some patches to tweak
> the x86 boot flow, which will need rebasing/retweaking) specifically
> this commit here -
> https://github.com/clearlinux/kvmtool/commit/a8dee709f85735d16739d0eda0cc00d3c1b17477

Awesome - I was actually thinking about coding something like this!
In the last week I move the MIPS ELF loading out of mips/ into /elf.c to
be able to load kvm-unit-tests' tests (which are Multiboot/ELF). As
multiboot requires entering in protected mode, I was thinking about
changing kvmtool to support entering a guest directly in protected mode
- seems like this is mostly what you've done here already.

Looks like we should both post our patches to merge them somehow ;-)

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] kvmtool: Cleanup kernel loading

2015-11-02 Thread Andre Przywara
Hi Dimitri,

On 02/11/15 15:17, Dimitri John Ledkov wrote:
> On 2 November 2015 at 14:58, Will Deacon  wrote:
>> On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote:
>>> Hi,
>>
>> Hello Andre,
>>
>>> this series cleans up kvmtool's kernel loading functionality a bit.
>>> It has been broken out of a previous series I sent [1] and contains
>>> just the cleanup and bug fix parts, which should be less controversial
>>> and thus easier to merge ;-)
>>> I will resend the pipe loading part later on as a separate series.
>>>
>>> The first patch properly abstracts kernel loading to move
>>> responsibility into each architecture's code. It removes quite some
>>> ugly code from the generic kvm.c file.
>>> The later patches address the naive usage of read(2) to, well, read
>>> data from files. Doing this without coping with the subtleties of
>>> the UNIX read semantics (returning with less or none data read is not
>>> an error) can provoke hard to debug failures.
>>> So these patches make use of the existing and one new wrapper function
>>> to make sure we read everything we actually wanted to.
>>> The last patch moves the ARM kernel loading code into the proper
>>> location to be in line with the other architectures.
>>>
>>> Please have a look and give some comments!
>>
>> Looks good to me, but I'd like to see some comments from some mips/ppc/x86
>> people on the changes you're making over there.
> 
> Looks mostly good to me, as one of the kvmtool down streams. Over at
> https://github.com/clearlinux/kvmtool we have some patches to tweak
> the x86 boot flow, which will need rebasing/retweaking) specifically
> this commit here -
> https://github.com/clearlinux/kvmtool/commit/a8dee709f85735d16739d0eda0cc00d3c1b17477

Awesome - I was actually thinking about coding something like this!
In the last week I move the MIPS ELF loading out of mips/ into /elf.c to
be able to load kvm-unit-tests' tests (which are Multiboot/ELF). As
multiboot requires entering in protected mode, I was thinking about
changing kvmtool to support entering a guest directly in protected mode
- seems like this is mostly what you've done here already.

Looks like we should both post our patches to merge them somehow ;-)

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 12/35] util: let qemu_fd_getlength support block device

2015-11-02 Thread Vladimir Sementsov-Ogievskiy

On 02.11.2015 12:13, Xiao Guangrong wrote:

lseek can not work for all block devices as the man page says:
| Some devices are incapable of seeking and POSIX does not specify
| which devices must support lseek().

This patch tries to add the support on Linux by using BLKGETSIZE64
ioctl

Signed-off-by: Xiao Guangrong 
---
  util/osdep.c | 20 
  1 file changed, 20 insertions(+)

diff --git a/util/osdep.c b/util/osdep.c
index 5a61e19..b20c793 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -45,6 +45,11 @@
  extern int madvise(caddr_t, size_t, int);
  #endif
  
+#ifdef CONFIG_LINUX

+#include 
+#include 
+#endif
+
  #include "qemu-common.h"
  #include "qemu/sockets.h"
  #include "qemu/error-report.h"
@@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
  {
  int64_t size;
  
+#ifdef CONFIG_LINUX

+struct stat stat_buf;
+if (fstat(fd, _buf) < 0) {
+return -errno;
+}
+
+if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, )) {
+/* The size of block device is larger than max int64_t? */
+if (size < 0) {
+return -EOVERFLOW;
+}
+return size;
+}
+#endif
+
  size = lseek(fd, 0, SEEK_END);
  if (size < 0) {
  return -errno;


Reviewed-by: Vladimir Sementsov-Ogievskiy 

just a question: is there any use for stat.st_size ? Is it always worse 
then lseek? Does it work for blk?


also, "This patch tries to add..". Hmm. It looks like this patch is not 
sure about will it success. I'd prefer "This patch adds", but this is 
not important


--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 12/35] util: let qemu_fd_getlength support block device

2015-11-02 Thread Xiao Guangrong



On 11/03/2015 12:11 AM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 12:13, Xiao Guangrong wrote:

lseek can not work for all block devices as the man page says:
| Some devices are incapable of seeking and POSIX does not specify
| which devices must support lseek().

This patch tries to add the support on Linux by using BLKGETSIZE64
ioctl

Signed-off-by: Xiao Guangrong 
---
  util/osdep.c | 20 
  1 file changed, 20 insertions(+)

diff --git a/util/osdep.c b/util/osdep.c
index 5a61e19..b20c793 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -45,6 +45,11 @@
  extern int madvise(caddr_t, size_t, int);
  #endif
+#ifdef CONFIG_LINUX
+#include 
+#include 
+#endif
+
  #include "qemu-common.h"
  #include "qemu/sockets.h"
  #include "qemu/error-report.h"
@@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
  {
  int64_t size;
+#ifdef CONFIG_LINUX
+struct stat stat_buf;
+if (fstat(fd, _buf) < 0) {
+return -errno;
+}
+
+if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, )) {
+/* The size of block device is larger than max int64_t? */
+if (size < 0) {
+return -EOVERFLOW;
+}
+return size;
+}
+#endif
+
  size = lseek(fd, 0, SEEK_END);
  if (size < 0) {
  return -errno;


Reviewed-by: Vladimir Sementsov-Ogievskiy 

just a question: is there any use for stat.st_size ? Is it always worse then 
lseek?


The man page says:
The  st_size field gives the size of the file (if it is a regular file or a 
symbolic link)
in bytes.  The size of a symbolic link is the length of the pathname it 
contains, without a
terminating null byte.

So it can not work on symbolic link.


Does it work for
blk?



Quickly checked with a program written by myself and 'stat' command, the answer 
is NO. :)


also, "This patch tries to add..". Hmm. It looks like this patch is not sure 
about will it success.
I'd prefer "This patch adds", but this is not important



Thanks for your sharing. I did not know the different, now, i got it. :)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic

2015-11-02 Thread Radim Krcmar
2015-11-02 17:08+0100, Paolo Bonzini:
> On 02/11/2015 15:59, Radim Krcmar wrote:
>>> We do not want to do too much work in atomic context, in particular
>>> not walking all the VCPUs of the virtual machine.  So we want
>>> to distinguish the architecture-specific injection function for irqfd
>>> from kvm_set_msi.  Since it's still empty, reuse the newly added
>>> kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic.
>> 
>> kvm/queue uses kvm_arch_set_irq since b7184313f4b9 ("kvm/x86: Hyper-V
>> synthetic interrupt controller").
>> 
>> Is synic going to be dropped before this patch is merged?
> 
> Yes.  Both because the Virtuozzo people confirmed that kvm_arch_set_irq
> isn't needed for synic, and because synic is currently broken with APICv.

Thanks.

(We can add direct delivery for |online vcpus| < X if performance with
 low number of VCPUs happens to regress because of the schedule_work,)

Reviewed-by: Radim Krčmář 

[2/3] and [3/3] are good too.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Shamir Rabinovitch
On Tue, Nov 03, 2015 at 07:13:28AM +1100, Benjamin Herrenschmidt wrote:
> 
> Then I would argue for naming this differently. Make it an optional
> hint "DMA_ATTR_HIGH_PERF" or something like that. Whether this is
> achieved via using a bypass or other means in the backend not the
> business of the driver.
> 

Correct. This comment was also from internal review :-) Although
currently there is strong opposition to such new attribute.

> 
> It will partially only but it's just an example of another way the
> bakcend could provide some improved performances without a bypass.

Just curious.. 

In the Infiniband case where user application request the driver to DMA 
map some random pages they allocated - will your suggestion still
function? 

I mean can you use this limited 1:1 mapping window to map any page the
user application choose? How?

At the bypass we just use the physical address of the page (almost as is).
We use the fact that bypass address space cover the whole memory and so we
have mapping for any address.

In IOMMU case we use the IOMMU translation tables and so can map 64 bit 
address to some 32 bit address (or less).

In your case it is not clear to me what can we do with such limited 1:1
mapping.

> 
> Cheers,
> Ben.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kbuild-all] [PATCH 5/6] KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU

2015-11-02 Thread Suresh E. Warrier
Hi Fengguang,

I understand the problem.

I will send you the URL for the powerpc git tree once my patches
are pulled in so that we can then pass that to the robot.

Thanks.
-suresh

On 11/01/2015 08:51 PM, Fengguang Wu wrote:
> Hi Suresh,
> 
> Sorry for the noise!
> 
> Do you have a git tree that the robot can monitor and test?
> 
> In this case of one patchset depending on another, there is no chance
> for the robot to do valid testing based on emailed patches.
> 
> Thanks,
> Fengguang
> 
> On Fri, Oct 30, 2015 at 10:16:06AM -0500, Suresh E. Warrier wrote:
>> This patch set depends upon a previous patch set that I had submitted to
>> linux-ppc. The URL for that is:
>>
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135794.html
>>
>> -suresh
>>
>> On 10/29/2015 11:52 PM, kbuild test robot wrote:
>>> Hi Suresh,
>>>
>>> [auto build test ERROR on kvm/linux-next -- if it's inappropriate base, 
>>> please suggest rules for selecting the more suitable base]
>>>
>>> url:
>>> https://github.com/0day-ci/linux/commits/Suresh-Warrier/KVM-PPC-Book3S-HV-Optimize-wakeup-VCPU-from-H_IPI/20151030-081329
>>> config: powerpc-defconfig (attached as .config)
>>> reproduce:
>>> wget 
>>> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>>>  -O ~/bin/make.cross
>>> chmod +x ~/bin/make.cross
>>> # save the attached .config to linux build tree
>>> make.cross ARCH=powerpc 
>>>
>>> All errors (new ones prefixed by >>):
>>>
>>>arch/powerpc/kvm/book3s_hv_rm_xics.c: In function 'icp_rm_set_vcpu_irq':
> arch/powerpc/kvm/book3s_hv_rm_xics.c:142:4: error: implicit declaration 
> of function 'smp_muxed_ipi_rm_message_pass' 
> [-Werror=implicit-function-declaration]
>>>smp_muxed_ipi_rm_message_pass(hcpu,
>>>^
>>>arch/powerpc/kvm/book3s_hv_rm_xics.c:143:7: error: 
>>> 'PPC_MSG_RM_HOST_ACTION' undeclared (first use in this function)
>>>   PPC_MSG_RM_HOST_ACTION);
>>>   ^
>>>arch/powerpc/kvm/book3s_hv_rm_xics.c:143:7: note: each undeclared 
>>> identifier is reported only once for each function it appears in
>>>cc1: all warnings being treated as errors
>>>
>>> vim +/smp_muxed_ipi_rm_message_pass +142 
>>> arch/powerpc/kvm/book3s_hv_rm_xics.c
>>>
>>>136  hcore = -1;
>>>137  if (kvmppc_host_rm_ops_hv)
>>>138  hcore = 
>>> find_available_hostcore(XICS_RM_KICK_VCPU);
>>>139  if (hcore != -1) {
>>>140  hcpu = hcore << threads_shift;
>>>141  
>>> kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
>>>  > 142  smp_muxed_ipi_rm_message_pass(hcpu,
>>>143  
>>> PPC_MSG_RM_HOST_ACTION);
>>>144  } else {
>>>145  this_icp->rm_action |= 
>>> XICS_RM_KICK_VCPU;
>>>
>>> ---
>>> 0-DAY kernel test infrastructureOpen Source Technology 
>>> Center
>>> https://lists.01.org/pipermail/kbuild-all   Intel 
>>> Corporation
>>>
>>
>> ___
>> kbuild-all mailing list
>> kbuild-...@lists.01.org
>> https://lists.01.org/mailman/listinfo/kbuild-all
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: fix SMEP and SMAP without EPT

2015-11-02 Thread Radim Krčmář
The comment in code had it mostly right, but we enable paging for
emulated real mode regardless of EPT.

Without EPT (which implies emulated real mode), secondary VCPUs won't
start unless we disable SM[AE]P when the guest doesn't use paging.

Signed-off-by: Radim Krčmář 
---
 arch/x86/kvm/vmx.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b680c2e0e8a3..ab598558a7a4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3788,20 +3788,21 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned 
long cr4)
if (!is_paging(vcpu)) {
hw_cr4 &= ~X86_CR4_PAE;
hw_cr4 |= X86_CR4_PSE;
-   /*
-* SMEP/SMAP is disabled if CPU is in non-paging mode
-* in hardware. However KVM always uses paging mode to
-* emulate guest non-paging mode with TDP.
-* To emulate this behavior, SMEP/SMAP needs to be
-* manually disabled when guest switches to non-paging
-* mode.
-*/
-   hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
} else if (!(cr4 & X86_CR4_PAE)) {
hw_cr4 &= ~X86_CR4_PAE;
}
}
 
+   if (!enable_unrestricted_guest && !is_paging(vcpu))
+   /*
+* SMEP/SMAP is disabled if CPU is in non-paging mode in
+* hardware.  However KVM always uses paging mode without
+* unrestricted guest.
+* To emulate this behavior, SMEP/SMAP needs to be manually
+* disabled when guest switches to non-paging mode.
+*/
+   hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+
vmcs_writel(CR4_READ_SHADOW, cr4);
vmcs_writel(GUEST_CR4, hw_cr4);
return 0;
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file

2015-11-02 Thread Paolo Bonzini


On 02/11/2015 10:13, Xiao Guangrong wrote:
> Currently, file_ram_alloc() only works on directory - it creates a file
> under @path and do mmap on it
> 
> This patch tries to allow it to work on file directly, if @path is a
> directory it works as before, otherwise it treats @path as the target
> file then directly allocate memory from it
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  exec.c | 80 
> ++
>  1 file changed, 51 insertions(+), 29 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 9075f4d..db0fdaf 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>  }
>  
>  #ifdef __linux__
> +static bool path_is_dir(const char *path)
> +{
> +struct stat fs;
> +
> +return stat(path, ) == 0 && S_ISDIR(fs.st_mode);
> +}
> +
> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
> +{
> +char *filename;
> +char *sanitized_name;
> +char *c;
> +int fd;
> +
> +if (!path_is_dir(path)) {
> +int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
> +
> +flags |= O_EXCL;
> +return open(path, flags);
> +}
> +
> +/* Make name safe to use with mkstemp by replacing '/' with '_'. */
> +sanitized_name = g_strdup(memory_region_name(block->mr));
> +for (c = sanitized_name; *c != '\0'; c++) {
> +if (*c == '/') {
> +*c = '_';
> +}
> +}
> +filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
> +   sanitized_name);
> +g_free(sanitized_name);
> +fd = mkstemp(filename);
> +if (fd >= 0) {
> +unlink(filename);
> +/*
> + * ftruncate is not supported by hugetlbfs in older
> + * hosts, so don't bother bailing out on errors.
> + * If anything goes wrong with it under other filesystems,
> + * mmap will fail.
> + */
> +if (ftruncate(fd, size)) {
> +perror("ftruncate");
> +}
> +}
> +g_free(filename);
> +
> +return fd;
> +}
> +
>  static void *file_ram_alloc(RAMBlock *block,
>  ram_addr_t memory,
>  const char *path,
>  Error **errp)
>  {
> -char *filename;
> -char *sanitized_name;
> -char *c;
>  void *area;
>  int fd;
>  uint64_t pagesize;
> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>  goto error;
>  }
>  
> -/* Make name safe to use with mkstemp by replacing '/' with '_'. */
> -sanitized_name = g_strdup(memory_region_name(block->mr));
> -for (c = sanitized_name; *c != '\0'; c++) {
> -if (*c == '/')
> -*c = '_';
> -}
> -
> -filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
> -   sanitized_name);
> -g_free(sanitized_name);
> +memory = ROUND_UP(memory, pagesize);
>  
> -fd = mkstemp(filename);
> +fd = open_ram_file_path(block, path, memory);
>  if (fd < 0) {
>  error_setg_errno(errp, errno,
>   "unable to create backing store for path %s", path);
> -g_free(filename);
>  goto error;
>  }
> -unlink(filename);
> -g_free(filename);
> -
> -memory = ROUND_UP(memory, pagesize);
> -
> -/*
> - * ftruncate is not supported by hugetlbfs in older
> - * hosts, so don't bother bailing out on errors.
> - * If anything goes wrong with it under other filesystems,
> - * mmap will fail.
> - */
> -if (ftruncate(fd, memory)) {
> -perror("ftruncate");
> -}
>  
>  area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>  if (area == MAP_FAILED) {
> 

I was going to send tomorrow a pull request for a similar patch,
"backends/hostmem-file: Allow to specify full pathname for backing file".

The main difference seems to be your usage of O_EXCL.  Can you explain
why you added it?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 17/21] KVM: ARM64: Add helper to handle PMCR register bits

2015-11-02 Thread Christopher Covington
On 10/30/2015 02:21 AM, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> According to ARMv8 spec, when writing 1 to PMCR.E, all counters are
> enabled by PMCNTENSET, while writing 0 to PMCR.E, all counters are
> disabled. When writing 1 to PMCR.P, reset all event counters, not
> including PMCCNTR, to zero. When writing 1 to PMCR.C, reset PMCCNTR to
> zero.

> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index ae21089..11d1bfb 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -121,6 +121,56 @@ void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 
> val)
>  }
>  
>  /**
> + * kvm_pmu_handle_pmcr - handle PMCR register
> + * @vcpu: The vcpu pointer
> + * @val: the value guest writes to PMCR register
> + */
> +void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val)
> +{
> + struct kvm_pmu *pmu = >arch.pmu;
> + struct kvm_pmc *pmc;
> + u32 enable;
> + int i;
> +
> + if (val & ARMV8_PMCR_E) {
> + if (!vcpu_mode_is_32bit(vcpu))
> + enable = vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
> + else
> + enable = vcpu_cp15(vcpu, c9_PMCNTENSET);
> +
> + kvm_pmu_enable_counter(vcpu, enable, true);
> + } else
> + kvm_pmu_disable_counter(vcpu, 0xUL);

Nit: If using braces on one side of if-else, please use them on the other.
(Search for "braces in both branches" in Documentation/CodingStyle.)

> +
> + if (val & ARMV8_PMCR_C) {
> + pmc = >pmc[ARMV8_MAX_COUNTERS - 1];
> + if (pmc->perf_event)
> + local64_set(>perf_event->count, 0);
> + if (!vcpu_mode_is_32bit(vcpu))
> + vcpu_sys_reg(vcpu, PMCCNTR_EL0) = 0;
> + else
> + vcpu_cp15(vcpu, c9_PMCCNTR) = 0;
> + }
> +
> + if (val & ARMV8_PMCR_P) {
> + for (i = 0; i < ARMV8_MAX_COUNTERS - 1; i++) {
> + pmc = >pmc[i];
> + if (pmc->perf_event)
> + local64_set(>perf_event->count, 0);
> + if (!vcpu_mode_is_32bit(vcpu))
> + vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = 0;
> + else
> + vcpu_cp15(vcpu, c14_PMEVCNTR0 + i) = 0;
> + }
> + }
> +
> + if (val & ARMV8_PMCR_LC) {
> + pmc = >pmc[ARMV8_MAX_COUNTERS - 1];
> + pmc->bitmask = 0xUL;
> + }
> +}
> +
> +/**
>   * kvm_pmu_overflow_clear - clear PMU overflow interrupt
>   * @vcpu: The vcpu pointer
>   * @val: the value guest writes to PMOVSCLR register
> 


-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 08/21] KVM: ARM64: Add reset and access handlers for PMXEVTYPER register

2015-11-02 Thread Christopher Covington
Hi Shannon,

On 10/30/2015 02:21 AM, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Since the reset value of PMXEVTYPER is UNKNOWN, use reset_unknown or
> reset_unknown_cp15 for its reset handler. Add access handler which
> emulates writing and reading PMXEVTYPER register. When writing to
> PMXEVTYPER, call kvm_pmu_set_counter_event_type to create a perf_event
> for the selected event type.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/kvm/sys_regs.c | 26 --
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index cb82b15..4e606ea 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -491,6 +491,17 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
>  
>   if (p->is_write) {
>   switch (r->reg) {
> + case PMXEVTYPER_EL0: {
> + val = vcpu_sys_reg(vcpu, PMSELR_EL0);
> + kvm_pmu_set_counter_event_type(vcpu,
> +*vcpu_reg(vcpu, p->Rt),
> +val);
> + vcpu_sys_reg(vcpu, PMXEVTYPER_EL0) =
> +  *vcpu_reg(vcpu, p->Rt);

Why does PMXEVTYPER get set directly? It seems like it could have an accessor
that redirected to PMEVTYPER.

> + vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + val) =
> +  *vcpu_reg(vcpu, p->Rt);

I tried to look around briefly but couldn't find counter number range checking
in the PMSELR handler or here. Should there be some here and in PMXEVCNTR?

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/3] KVM: arm/arm64: Clean up some obsolete code

2015-11-02 Thread Christoffer Dall
On Tue, Oct 27, 2015 at 11:37:28AM +0300, Pavel Fedin wrote:
> Current KVM code has lots of old redundancies, which can be cleaned up.
> This patchset is actually a better alternative to
> http://www.spinics.net/lists/arm-kernel/msg430726.html, which allows to
> keep piggy-backed LRs. The idea is based on the fact that our code also
> maintains LR state in elrsr, and this information is enough to track LR
> usage.
> 
> In case of problems this series can be applied partially, each patch is
> a complete refactoring step on its own.
> 
> Thanks to Andre Przywara for pinpointing some 4.3+ specifics.
> 
> This version has been tested on SMDK5410 development board
> (Exynos5410 SoC).

I ran this through my test scripts and I'm now quite sure that there's
some breakage in here.

One of my tests is running two VMs in parallel, each booting up, running
hackbench, and then doing reboot (from within the guest), and just
repeating like that.

I've run your patches in the above config 100 times, and every time,
the rebooting VMs got stuck before 50 reboots.

Without these patches, I could run the above config 100 times, and every
time, the rebooting VMs passed 200 reboots.

I'll try to test each patch individually soon.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-02 Thread Arnd Bergmann
On Tuesday 03 November 2015 07:13:28 Benjamin Herrenschmidt wrote:
> On Mon, 2015-11-02 at 14:07 +0200, Shamir Rabinovitch wrote:
> > On Mon, Nov 02, 2015 at 09:00:34PM +1100, Benjamin Herrenschmidt
> > wrote:
> > > 
> > > Chosing on a per-mapping basis *in the back end* might still make
> > > some
> > 
> > In my case, choosing mapping based on the hardware that will use this
> > mappings makes more sense. Most hardware are not that performance 
> > sensitive as the Infiniband hardware.
> 
>  ...
> 
> > The driver know for what hardware it is mapping the memory so it know 
> > if the memory will be used by performance sensitive hardware or not.
> 
> Then I would argue for naming this differently. Make it an optional
> hint "DMA_ATTR_HIGH_PERF" or something like that. Whether this is
> achieved via using a bypass or other means in the backend not the
> business of the driver.
> 

With a name like that, who wouldn't pass that flag? ;-)

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 09/35] exec: allow file_ram_alloc to work on file

2015-11-02 Thread Xiao Guangrong
Currently, file_ram_alloc() only works on directory - it creates a file
under @path and do mmap on it

This patch tries to allow it to work on file directly, if @path is a
directory it works as before, otherwise it treats @path as the target
file then directly allocate memory from it

Signed-off-by: Xiao Guangrong 
---
 exec.c | 80 ++
 1 file changed, 51 insertions(+), 29 deletions(-)

diff --git a/exec.c b/exec.c
index 9075f4d..db0fdaf 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
 }
 
 #ifdef __linux__
+static bool path_is_dir(const char *path)
+{
+struct stat fs;
+
+return stat(path, ) == 0 && S_ISDIR(fs.st_mode);
+}
+
+static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
+{
+char *filename;
+char *sanitized_name;
+char *c;
+int fd;
+
+if (!path_is_dir(path)) {
+int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
+
+flags |= O_EXCL;
+return open(path, flags);
+}
+
+/* Make name safe to use with mkstemp by replacing '/' with '_'. */
+sanitized_name = g_strdup(memory_region_name(block->mr));
+for (c = sanitized_name; *c != '\0'; c++) {
+if (*c == '/') {
+*c = '_';
+}
+}
+filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
+   sanitized_name);
+g_free(sanitized_name);
+fd = mkstemp(filename);
+if (fd >= 0) {
+unlink(filename);
+/*
+ * ftruncate is not supported by hugetlbfs in older
+ * hosts, so don't bother bailing out on errors.
+ * If anything goes wrong with it under other filesystems,
+ * mmap will fail.
+ */
+if (ftruncate(fd, size)) {
+perror("ftruncate");
+}
+}
+g_free(filename);
+
+return fd;
+}
+
 static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 const char *path,
 Error **errp)
 {
-char *filename;
-char *sanitized_name;
-char *c;
 void *area;
 int fd;
 uint64_t pagesize;
@@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
 goto error;
 }
 
-/* Make name safe to use with mkstemp by replacing '/' with '_'. */
-sanitized_name = g_strdup(memory_region_name(block->mr));
-for (c = sanitized_name; *c != '\0'; c++) {
-if (*c == '/')
-*c = '_';
-}
-
-filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
-   sanitized_name);
-g_free(sanitized_name);
+memory = ROUND_UP(memory, pagesize);
 
-fd = mkstemp(filename);
+fd = open_ram_file_path(block, path, memory);
 if (fd < 0) {
 error_setg_errno(errp, errno,
  "unable to create backing store for path %s", path);
-g_free(filename);
 goto error;
 }
-unlink(filename);
-g_free(filename);
-
-memory = ROUND_UP(memory, pagesize);
-
-/*
- * ftruncate is not supported by hugetlbfs in older
- * hosts, so don't bother bailing out on errors.
- * If anything goes wrong with it under other filesystems,
- * mmap will fail.
- */
-if (ftruncate(fd, memory)) {
-perror("ftruncate");
-}
 
 area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
 if (area == MAP_FAILED) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region

2015-11-02 Thread Xiao Guangrong
Curretly, the memory region of backed memory is directly mapped to
guest's address space, however, it is not true for nvdimm device

This patch let dimm device realize this fact and use
DIMMDeviceClass->get_memory_region method to get the mapped memory
region

Current code did not check the return value of get_memory_region as it
assumed the backend memory of pc-dimm is always properly initialized,
we make get_memory_region internally catch the case if something is
wrong

Signed-off-by: Xiao Guangrong 
---
 hw/mem/dimm.c|  3 ++-
 hw/mem/pc-dimm.c | 12 +++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 4a63409..498d380 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor *v, void 
*opaque,
 int64_t value;
 MemoryRegion *mr;
 DIMMDevice *dimm = DIMM(obj);
+DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
 
-mr = host_memory_backend_get_memory(dimm->hostmem, errp);
+mr = ddc->get_memory_region(dimm);
 value = memory_region_size(mr);
 
 visit_type_int(v, , name, errp);
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 38323e9..e6b6a9f 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -22,7 +22,17 @@
 
 static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
 {
-return host_memory_backend_get_memory(dimm->hostmem, _abort);
+Error *local_err = NULL;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(dimm->hostmem, _err);
+
+/*
+ * plug a pc-dimm device whose backend memory was not properly
+ * initialized?
+ */
+assert(!local_err && mr);
+return mr;
 }
 
 static void pc_dimm_class_init(ObjectClass *oc, void *data)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 16/35] pc-dimm: drop the prefix of pc-dimm

2015-11-02 Thread Xiao Guangrong
This patch is generated by this script:

find ./ -name "*.[ch]" -o -name "*.json" -o -name "trace-events" \
| xargs sed -i "s/PC_DIMM/DIMM/g"

find ./ -name "*.[ch]" -o -name "*.json" -o -name "trace-events" \
| xargs sed -i "s/PCDIMM/DIMM/g"

find ./ -name "*.[ch]" -o -name "*.json" -o -name "trace-events" \
| xargs sed -i "s/pc_dimm/dimm/g"

find ./ -name "trace-events" | xargs sed -i "s/pc-dimm/dimm/g"

It prepares the work which abstracts dimm device type for both pc-dimm and
nvdimm

Reviewed-by: Eric Blake 
Signed-off-by: Xiao Guangrong 
---
 hmp.c   |   2 +-
 hw/acpi/ich9.c  |   6 +-
 hw/acpi/memory_hotplug.c|  16 ++---
 hw/acpi/piix4.c |   6 +-
 hw/i386/pc.c|  32 -
 hw/mem/pc-dimm.c| 148 
 hw/ppc/spapr.c  |  18 ++---
 include/hw/mem/pc-dimm.h|  62 -
 numa.c  |   2 +-
 qapi-schema.json|   8 +--
 qmp.c   |   2 +-
 stubs/qmp_pc_dimm_device_list.c |   2 +-
 trace-events|   8 +--
 13 files changed, 156 insertions(+), 156 deletions(-)

diff --git a/hmp.c b/hmp.c
index 5048eee..5c617d2 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1952,7 +1952,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
*qdict)
 MemoryDeviceInfoList *info_list = qmp_query_memory_devices();
 MemoryDeviceInfoList *info;
 MemoryDeviceInfo *value;
-PCDIMMDeviceInfo *di;
+DIMMDeviceInfo *di;
 
 for (info = info_list; info; info = info->next) {
 value = info->value;
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 1c7fcfa..b0d6a67 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -440,7 +440,7 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm, 
Error **errp)
 void ich9_pm_device_plug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp)
 {
 if (pm->acpi_memory_hotplug.is_enabled &&
-object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
 acpi_memory_plug_cb(>acpi_regs, pm->irq, >acpi_memory_hotplug,
 dev, errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
@@ -455,7 +455,7 @@ void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, 
DeviceState *dev,
   Error **errp)
 {
 if (pm->acpi_memory_hotplug.is_enabled &&
-object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
 acpi_memory_unplug_request_cb(>acpi_regs, pm->irq,
   >acpi_memory_hotplug, dev, errp);
 } else {
@@ -468,7 +468,7 @@ void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, 
DeviceState *dev,
   Error **errp)
 {
 if (pm->acpi_memory_hotplug.is_enabled &&
-object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
 acpi_memory_unplug_cb(>acpi_memory_hotplug, dev, errp);
 } else {
 error_setg(errp, "acpi: device unplug for not supported device"
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index ce428df..e687852 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -54,23 +54,23 @@ static uint64_t acpi_memory_hotplug_read(void *opaque, 
hwaddr addr,
 o = OBJECT(mdev->dimm);
 switch (addr) {
 case 0x0: /* Lo part of phys address where DIMM is mapped */
-val = o ? object_property_get_int(o, PC_DIMM_ADDR_PROP, NULL) : 0;
+val = o ? object_property_get_int(o, DIMM_ADDR_PROP, NULL) : 0;
 trace_mhp_acpi_read_addr_lo(mem_st->selector, val);
 break;
 case 0x4: /* Hi part of phys address where DIMM is mapped */
-val = o ? object_property_get_int(o, PC_DIMM_ADDR_PROP, NULL) >> 32 : 
0;
+val = o ? object_property_get_int(o, DIMM_ADDR_PROP, NULL) >> 32 : 0;
 trace_mhp_acpi_read_addr_hi(mem_st->selector, val);
 break;
 case 0x8: /* Lo part of DIMM size */
-val = o ? object_property_get_int(o, PC_DIMM_SIZE_PROP, NULL) : 0;
+val = o ? object_property_get_int(o, DIMM_SIZE_PROP, NULL) : 0;
 trace_mhp_acpi_read_size_lo(mem_st->selector, val);
 break;
 case 0xc: /* Hi part of DIMM size */
-val = o ? object_property_get_int(o, PC_DIMM_SIZE_PROP, NULL) >> 32 : 
0;
+val = o ? object_property_get_int(o, DIMM_SIZE_PROP, NULL) >> 32 : 0;
 trace_mhp_acpi_read_size_hi(mem_st->selector, val);
 break;
 case 0x10: /* node proximity for _PXM method */
-val = o ? object_property_get_int(o, PC_DIMM_NODE_PROP, NULL) : 0;
+val = o ? object_property_get_int(o, DIMM_NODE_PROP, NULL) : 0;
 trace_mhp_acpi_read_pxm(mem_st->selector, val);
 break;
 case 0x14: /* pack and 

[PATCH v7 13/35] hostmem-file: use whole file size if possible

2015-11-02 Thread Xiao Guangrong
Use the whole file size if @size is not specified which is useful
if we want to directly pass a file to guest

Signed-off-by: Xiao Guangrong 
---
 backends/hostmem-file.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 9097a57..ea355c1 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -38,15 +38,29 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 {
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
 
-if (!backend->size) {
-error_setg(errp, "can't create backend with size 0");
-return;
-}
 if (!fb->mem_path) {
 error_setg(errp, "mem-path property not set");
 return;
 }
 
+if (!backend->size) {
+Error *local_err = NULL;
+
+/*
+ * use the whole file size if @size is not specified.
+ */
+backend->size = qemu_file_getlength(fb->mem_path, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+
+if (!backend->size) {
+error_setg(errp, "can't create backend on the file whose size is 0");
+return;
+}
+
 backend->force_prealloc = mem_prealloc;
 memory_region_init_ram_from_file(>mr, OBJECT(backend),
  object_get_canonical_path(OBJECT(backend)),
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 03/35] acpi: add aml_create_field

2015-11-02 Thread Xiao Guangrong
Implement CreateField term which is used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 13 +
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a72214d..9fe5e7b 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1151,6 +1151,19 @@ Aml *aml_sizeof(Aml *arg)
 return var;
 }
 
+/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateField */
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
+{
+Aml *var = aml_alloc();
+build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
+build_append_byte(var->buf, 0x13); /* CreateFieldOp */
+aml_append(var, srcbuf);
+aml_append(var, index);
+aml_append(var, len);
+build_append_namestring(var->buf, "%s", name);
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 7296efb..7e1c43b 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -276,6 +276,7 @@ Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 04/35] acpi: add aml_concatenate

2015-11-02 Thread Xiao Guangrong
Implement Concatenate term which is used by NVDIMM _DSM method
in later patch

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 14 ++
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9fe5e7b..efc06ab 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1164,6 +1164,20 @@ Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, 
const char *name)
 return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefConcat */
+Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target)
+{
+Aml *var = aml_opcode(0x73 /* ConcatOp */);
+aml_append(var, source1);
+aml_append(var, source2);
+
+if (target) {
+aml_append(var, target);
+}
+
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 7e1c43b..325782d 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -277,6 +277,7 @@ Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
 Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
+Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 01/35] acpi: add aml_derefof

2015-11-02 Thread Xiao Guangrong
Implement DeRefOf term which is used by NVDIMM _DSM method in later patch

Reviewed-by: Igor Mammedov 
Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 8 
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 0d4b324..cbd53f4 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1135,6 +1135,14 @@ Aml *aml_unicode(const char *str)
 return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefDerefOf */
+Aml *aml_derefof(Aml *arg)
+{
+Aml *var = aml_opcode(0x83 /* DerefOfOp */);
+aml_append(var, arg);
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 1b632dc..5a03d33 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -274,6 +274,7 @@ Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const 
char *name);
 Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
+Aml *aml_derefof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 05/35] acpi: add aml_object_type

2015-11-02 Thread Xiao Guangrong
Implement ObjectType which is used by NVDIMM _DSM method in
later patch

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 8 
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index efc06ab..9f792ab 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1178,6 +1178,14 @@ Aml *aml_concatenate(Aml *source1, Aml *source2, Aml 
*target)
 return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefObjectType */
+Aml *aml_object_type(Aml *object)
+{
+Aml *var = aml_opcode(0x8E /* ObjectTypeOp */);
+aml_append(var, object);
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 325782d..5b8a118 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -278,6 +278,7 @@ Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
 Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target);
+Aml *aml_object_type(Aml *object);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 07/35] util: introduce qemu_file_get_page_size()

2015-11-02 Thread Xiao Guangrong
There are three places use the some logic to get the page size on
the file path or file fd

Windows did not support file hugepage, so it will return normal page
for this case. And this interface has not been used on windows so far
 
This patch introduces qemu_file_get_page_size() to unify the code

Signed-off-by: Xiao Guangrong 
---
 exec.c   | 33 ++---
 include/qemu/osdep.h |  1 +
 target-ppc/kvm.c | 23 +--
 util/oslib-posix.c   | 37 +
 util/oslib-win32.c   |  5 +
 5 files changed, 50 insertions(+), 49 deletions(-)

diff --git a/exec.c b/exec.c
index 8af2570..9de38be 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,32 +1174,6 @@ void qemu_mutex_unlock_ramlist(void)
 }
 
 #ifdef __linux__
-
-#include 
-
-#define HUGETLBFS_MAGIC   0x958458f6
-
-static long gethugepagesize(const char *path, Error **errp)
-{
-struct statfs fs;
-int ret;
-
-do {
-ret = statfs(path, );
-} while (ret != 0 && errno == EINTR);
-
-if (ret != 0) {
-error_setg_errno(errp, errno, "failed to get page size of file %s",
- path);
-return 0;
-}
-
-if (fs.f_type != HUGETLBFS_MAGIC)
-fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
-
-return fs.f_bsize;
-}
-
 static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 const char *path,
@@ -1213,11 +1187,16 @@ static void *file_ram_alloc(RAMBlock *block,
 uint64_t hpagesize;
 Error *local_err = NULL;
 
-hpagesize = gethugepagesize(path, _err);
+hpagesize = qemu_file_get_page_size(path, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 goto error;
 }
+
+if (hpagesize == getpagesize()) {
+fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+}
+
 block->mr->align = hpagesize;
 
 if (memory < hpagesize) {
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b568424..dbc17dc 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -302,4 +302,5 @@ int qemu_read_password(char *buf, int buf_size);
  */
 pid_t qemu_fork(Error **errp);
 
+size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
 #endif
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index ac70f08..d8760ea 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -308,28 +308,15 @@ static void kvm_get_smmu_info(PowerPCCPU *cpu, struct 
kvm_ppc_smmu_info *info)
 
 static long gethugepagesize(const char *mem_path)
 {
-struct statfs fs;
-int ret;
-
-do {
-ret = statfs(mem_path, );
-} while (ret != 0 && errno == EINTR);
+Error *local_err = NULL;
+long size = qemu_file_get_page_size(mem_path, local_err);
 
-if (ret != 0) {
-fprintf(stderr, "Couldn't statfs() memory path: %s\n",
-strerror(errno));
+if (local_err) {
+error_report_err(local_err);
 exit(1);
 }
 
-#define HUGETLBFS_MAGIC   0x958458f6
-
-if (fs.f_type != HUGETLBFS_MAGIC) {
-/* Explicit mempath, but it's ordinary pages */
-return getpagesize();
-}
-
-/* It's hugepage, return the huge page size */
-return fs.f_bsize;
+return size;
 }
 
 static int find_max_supported_pagesize(Object *obj, void *opaque)
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 914cef5..51437ff 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
 siglongjmp(sigjump, 1);
 }
 
-static size_t fd_getpagesize(int fd)
+static size_t fd_getpagesize(int fd, Error **errp)
 {
 #ifdef CONFIG_LINUX
 struct statfs fs;
@@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
 ret = fstatfs(fd, );
 } while (ret != 0 && errno == EINTR);
 
-if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
+if (ret) {
+error_setg_errno(errp, errno, "fstatfs is failed");
+return 0;
+}
+
+if (fs.f_type == HUGETLBFS_MAGIC) {
 return fs.f_bsize;
 }
 }
@@ -360,6 +365,22 @@ static size_t fd_getpagesize(int fd)
 return getpagesize();
 }
 
+size_t qemu_file_get_page_size(const char *path, Error **errp)
+{
+size_t size = 0;
+int fd = qemu_open(path, O_RDONLY);
+
+if (fd < 0) {
+error_setg_file_open(errp, errno, path);
+goto exit;
+}
+
+size = fd_getpagesize(fd, errp);
+qemu_close(fd);
+exit:
+return size;
+}
+
 void os_mem_prealloc(int fd, char *area, size_t memory)
 {
 int ret;
@@ -387,8 +408,16 @@ void os_mem_prealloc(int fd, char *area, size_t memory)
 exit(1);
 } else {
 int i;
-size_t hpagesize = fd_getpagesize(fd);
-size_t numpages = DIV_ROUND_UP(memory, hpagesize);
+Error *local_err = NULL;
+size_t hpagesize = fd_getpagesize(fd, _err);
+ 

[PATCH v7 02/35] acpi: add aml_sizeof

2015-11-02 Thread Xiao Guangrong
Implement SizeOf term which is used by NVDIMM _DSM method in later patch

Reviewed-by: Igor Mammedov 
Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 8 
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index cbd53f4..a72214d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1143,6 +1143,14 @@ Aml *aml_derefof(Aml *arg)
 return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefSizeOf */
+Aml *aml_sizeof(Aml *arg)
+{
+Aml *var = aml_opcode(0x87 /* SizeOfOp */);
+aml_append(var, arg);
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 5a03d33..7296efb 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -275,6 +275,7 @@ Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
+Aml *aml_sizeof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 06/35] acpi: add aml_method_serialized

2015-11-02 Thread Xiao Guangrong
It avoid explicit Mutex and will be used by NVDIMM ACPI

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 26 --
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9f792ab..8bee8b2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -696,14 +696,36 @@ Aml *aml_while(Aml *predicate)
 }
 
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefMethod */
-Aml *aml_method(const char *name, int arg_count)
+static Aml *__aml_method(const char *name, int arg_count, bool serialized)
 {
 Aml *var = aml_bundle(0x14 /* MethodOp */, AML_PACKAGE);
+int methodflags;
+
+/*
+ * MethodFlags:
+ *   bit 0-2: ArgCount (0-7)
+ *   bit 3: SerializeFlag
+ * 0: NotSerialized
+ * 1: Serialized
+ *   bit 4-7: reserved (must be 0)
+ */
+assert(!(arg_count & ~7));
+methodflags = arg_count | (serialized << 3);
 build_append_namestring(var->buf, "%s", name);
-build_append_byte(var->buf, arg_count); /* MethodFlags: ArgCount */
+build_append_byte(var->buf, methodflags);
 return var;
 }
 
+Aml *aml_method(const char *name, int arg_count)
+{
+return __aml_method(name, arg_count, false);
+}
+
+Aml *aml_method_serialized(const char *name, int arg_count)
+{
+return __aml_method(name, arg_count, true);
+}
+
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefDevice */
 Aml *aml_device(const char *name_format, ...)
 {
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 5b8a118..00cf40e 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -263,6 +263,7 @@ Aml *aml_qword_memory(AmlDecode dec, AmlMinFixed min_fixed,
 Aml *aml_scope(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
 Aml *aml_device(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
 Aml *aml_method(const char *name, int arg_count);
+Aml *aml_method_serialized(const char *name, int arg_count);
 Aml *aml_if(Aml *predicate);
 Aml *aml_else(void);
 Aml *aml_while(Aml *predicate);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 08/35] exec: allow memory to be allocated from any kind of path

2015-11-02 Thread Xiao Guangrong
Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
locates at DAX enabled filesystem

So this patch let it work on any kind of path

Signed-off-by: Xiao Guangrong 
---
 exec.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index 9de38be..9075f4d 100644
--- a/exec.c
+++ b/exec.c
@@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
 char *c;
 void *area;
 int fd;
-uint64_t hpagesize;
+uint64_t pagesize;
 Error *local_err = NULL;
 
-hpagesize = qemu_file_get_page_size(path, _err);
+pagesize = qemu_file_get_page_size(path, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 goto error;
 }
 
-if (hpagesize == getpagesize()) {
-fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+if (pagesize == getpagesize()) {
+fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
 }
 
-block->mr->align = hpagesize;
+block->mr->align = pagesize;
 
-if (memory < hpagesize) {
+if (memory < pagesize) {
 error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
-   "or larger than huge page size 0x%" PRIx64,
-   memory, hpagesize);
+   "or larger than page size 0x%" PRIx64,
+   memory, pagesize);
 goto error;
 }
 
@@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
 fd = mkstemp(filename);
 if (fd < 0) {
 error_setg_errno(errp, errno,
- "unable to create backing store for hugepages");
+ "unable to create backing store for path %s", path);
 g_free(filename);
 goto error;
 }
 unlink(filename);
 g_free(filename);
 
-memory = ROUND_UP(memory, hpagesize);
+memory = ROUND_UP(memory, pagesize);
 
 /*
  * ftruncate is not supported by hugetlbfs in older
@@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
 perror("ftruncate");
 }
 
-area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
+area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
 if (area == MAP_FAILED) {
 error_setg_errno(errp, errno,
- "unable to map backing store for hugepages");
+ "unable to map backing store for path %s", path);
 close(fd);
 goto error;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 14/35] pc-dimm: remove DEFAULT_PC_DIMMSIZE

2015-11-02 Thread Xiao Guangrong
It's not used any more

Signed-off-by: Xiao Guangrong 
---
 include/hw/mem/pc-dimm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index d83bf30..11a8937 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -20,8 +20,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/qdev.h"
 
-#define DEFAULT_PC_DIMMSIZE (1024*1024*1024)
-
 #define TYPE_PC_DIMM "pc-dimm"
 #define PC_DIMM(obj) \
 OBJECT_CHECK(PCDIMMDevice, (obj), TYPE_PC_DIMM)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 10/35] hostmem-file: clean up memory allocation

2015-11-02 Thread Xiao Guangrong
- hostmem-file.c is compiled only if CONFIG_LINUX is enabled so that is
  unnecessary to do the same check in the source file

- the interface, HostMemoryBackendClass->alloc(), is not called many
  times, do not need to check if the memory-region is initialized

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Xiao Guangrong 
---
 backends/hostmem-file.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e9b6d21..9097a57 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -46,17 +46,12 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 error_setg(errp, "mem-path property not set");
 return;
 }
-#ifndef CONFIG_LINUX
-error_setg(errp, "-mem-path not supported on this host");
-#else
-if (!memory_region_size(>mr)) {
-backend->force_prealloc = mem_prealloc;
-memory_region_init_ram_from_file(>mr, OBJECT(backend),
+
+backend->force_prealloc = mem_prealloc;
+memory_region_init_ram_from_file(>mr, OBJECT(backend),
  object_get_canonical_path(OBJECT(backend)),
  backend->size, fb->share,
  fb->mem_path, errp);
-}
-#endif
 }
 
 static void
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 15/35] pc-dimm: make pc_existing_dimms_capacity static and rename it

2015-11-02 Thread Xiao Guangrong
pc_existing_dimms_capacity() can be static since it is not used out of
pc-dimm.c and drop the pc_ prefix to prepare the work which abstracts
dimm device type from pc-dimm

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Xiao Guangrong 
---
 hw/mem/pc-dimm.c | 73 
 include/hw/mem/pc-dimm.h |  1 -
 2 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 80f424b..2dcbbcd 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -32,6 +32,38 @@ typedef struct pc_dimms_capacity {
  Error**errp;
 } pc_dimms_capacity;
 
+static int existing_dimms_capacity_internal(Object *obj, void *opaque)
+{
+pc_dimms_capacity *cap = opaque;
+uint64_t *size = >size;
+
+if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+DeviceState *dev = DEVICE(obj);
+
+if (dev->realized) {
+(*size) += object_property_get_int(obj, PC_DIMM_SIZE_PROP,
+cap->errp);
+}
+
+if (cap->errp && *cap->errp) {
+return 1;
+}
+}
+object_child_foreach(obj, existing_dimms_capacity_internal, opaque);
+return 0;
+}
+
+static uint64_t existing_dimms_capacity(Error **errp)
+{
+pc_dimms_capacity cap;
+
+cap.size = 0;
+cap.errp = errp;
+
+existing_dimms_capacity_internal(qdev_get_machine(), );
+return cap.size;
+}
+
 void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
  MemoryRegion *mr, uint64_t align, Error **errp)
 {
@@ -39,7 +71,7 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState 
*hpms,
 MachineState *machine = MACHINE(qdev_get_machine());
 PCDIMMDevice *dimm = PC_DIMM(dev);
 Error *local_err = NULL;
-uint64_t existing_dimms_capacity = 0;
+uint64_t dimms_capacity = 0;
 uint64_t addr;
 
 addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, 
_err);
@@ -55,17 +87,16 @@ void pc_dimm_memory_plug(DeviceState *dev, 
MemoryHotplugState *hpms,
 goto out;
 }
 
-existing_dimms_capacity = pc_existing_dimms_capacity(_err);
+dimms_capacity = existing_dimms_capacity(_err);
 if (local_err) {
 goto out;
 }
 
-if (existing_dimms_capacity + memory_region_size(mr) >
+if (dimms_capacity + memory_region_size(mr) >
 machine->maxram_size - machine->ram_size) {
 error_setg(_err, "not enough space, currently 0x%" PRIx64
" in use of total hot pluggable 0x" RAM_ADDR_FMT,
-   existing_dimms_capacity,
-   machine->maxram_size - machine->ram_size);
+   dimms_capacity, machine->maxram_size - machine->ram_size);
 goto out;
 }
 
@@ -120,38 +151,6 @@ void pc_dimm_memory_unplug(DeviceState *dev, 
MemoryHotplugState *hpms,
 vmstate_unregister_ram(mr, dev);
 }
 
-static int pc_existing_dimms_capacity_internal(Object *obj, void *opaque)
-{
-pc_dimms_capacity *cap = opaque;
-uint64_t *size = >size;
-
-if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
-DeviceState *dev = DEVICE(obj);
-
-if (dev->realized) {
-(*size) += object_property_get_int(obj, PC_DIMM_SIZE_PROP,
-cap->errp);
-}
-
-if (cap->errp && *cap->errp) {
-return 1;
-}
-}
-object_child_foreach(obj, pc_existing_dimms_capacity_internal, opaque);
-return 0;
-}
-
-uint64_t pc_existing_dimms_capacity(Error **errp)
-{
-pc_dimms_capacity cap;
-
-cap.size = 0;
-cap.errp = errp;
-
-pc_existing_dimms_capacity_internal(qdev_get_machine(), );
-return cap.size;
-}
-
 int qmp_pc_dimm_device_list(Object *obj, void *opaque)
 {
 MemoryDeviceInfoList ***prev = opaque;
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index 11a8937..8a43548 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -87,7 +87,6 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
 int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
 
 int qmp_pc_dimm_device_list(Object *obj, void *opaque);
-uint64_t pc_existing_dimms_capacity(Error **errp);
 void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
  MemoryRegion *mr, uint64_t align, Error **errp);
 void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 34/35] nvdimm acpi: support _FIT method

2015-11-02 Thread Xiao Guangrong
FIT buffer is not completely mapped into guest address space, so a new
function, Read FIT, function index 0x, is reserved by QEMU to
read the piece of FIT buffer. The buffer is concatenated before _FIT
return

Refer to docs/specs/acpi-nvdimm.txt for detailed design

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/nvdimm.c | 168 +--
 1 file changed, 164 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index eab5d9c..f5b5c12 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -384,6 +384,18 @@ static void nvdimm_build_nfit(GSList *device_list, GArray 
*table_offsets,
 g_array_free(structures, true);
 }
 
+/*
+ * define UUID for NVDIMM Root Device according to Chapter 3 DSM Interface
+ * for NVDIMM Root Device - Example in DSM Spec Rev1.
+ */
+#define NVDIMM_DSM_ROOT_UUID "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
+
+/*
+ * Read FIT Function, which is a QEMU internal use only function, more detail
+ * refer to docs/specs/acpi_nvdimm.txt
+ */
+#define NVDIMM_DSM_FUNC_READ_FIT 0x
+
 /* define NVDIMM DSM return status codes according to DSM Spec Rev1. */
 enum {
 /* Common return status codes. */
@@ -420,6 +432,11 @@ struct NvdimmFuncInSetLabelData {
 } QEMU_PACKED;
 typedef struct NvdimmFuncInSetLabelData NvdimmFuncInSetLabelData;
 
+struct NvdimmFuncInReadFit {
+uint32_t offset; /* fit offset */
+} QEMU_PACKED;
+typedef struct NvdimmFuncInReadFit NvdimmFuncInReadFit;
+
 struct NvdimmDsmIn {
 uint32_t handle;
 uint32_t revision;
@@ -429,6 +446,7 @@ struct NvdimmDsmIn {
 uint8_t arg3[0];
 NvdimmFuncInSetLabelData func_set_label_data;
 NvdimmFuncInGetLabelData func_get_label_data;
+NvdimmFuncInReadFit func_read_fit;
 };
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
@@ -450,13 +468,71 @@ struct NvdimmFuncOutGetLabelData {
 } QEMU_PACKED;
 typedef struct NvdimmFuncOutGetLabelData NvdimmFuncOutGetLabelData;
 
+struct NvdimmFuncOutReadFit {
+uint32_t status;/* return status code. */
+uint32_t length;/* the length of fit data we read. */
+uint8_t fit_data[0]; /* fit data. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncOutReadFit NvdimmFuncOutReadFit;
+
 static void nvdimm_dsm_write_status(GArray *out, uint32_t status)
 {
 status = cpu_to_le32(status);
 build_append_int_noprefix(out, status, sizeof(status));
 }
 
-static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
+/* Build fit memory which is presented to guest via _FIT method. */
+static void nvdimm_build_fit(AcpiNVDIMMState *state)
+{
+if (!state->fit) {
+GSList *device_list = nvdimm_get_plugged_device_list();
+
+nvdimm_debug("Rebuild FIT...\n");
+state->fit = nvdimm_build_device_structure(device_list);
+g_slist_free(device_list);
+}
+}
+
+/* Read FIT data, defined in docs/specs/acpi_nvdimm.txt. */
+static void nvdimm_dsm_func_read_fit(AcpiNVDIMMState *state,
+ NvdimmDsmIn *in, GArray *out)
+{
+NvdimmFuncInReadFit *read_fit = >func_read_fit;
+NvdimmFuncOutReadFit fit_out;
+uint32_t read_length = TARGET_PAGE_SIZE - sizeof(NvdimmFuncOutReadFit);
+uint32_t status = NVDIMM_DSM_ROOT_DEV_STATUS_INVALID_PARAS;
+
+nvdimm_build_fit(state);
+
+le32_to_cpus(_fit->offset);
+
+nvdimm_debug("Read FIT offset %#x.\n", read_fit->offset);
+
+if (read_fit->offset > state->fit->len) {
+nvdimm_debug("offset %#x is beyond fit size (%#x).\n",
+ read_fit->offset, state->fit->len);
+goto exit;
+}
+
+read_length = MIN(read_length, state->fit->len - read_fit->offset);
+nvdimm_debug("read length %#x.\n", read_length);
+
+fit_out.status = cpu_to_le32(NVDIMM_DSM_STATUS_SUCCESS);
+fit_out.length = cpu_to_le32(read_length);
+g_array_append_vals(out, _out, sizeof(fit_out));
+
+if (read_length) {
+g_array_append_vals(out, state->fit->data + read_fit->offset,
+read_length);
+}
+return;
+
+exit:
+nvdimm_dsm_write_status(out, status);
+}
+
+static void nvdimm_dsm_root(AcpiNVDIMMState *state, NvdimmDsmIn *in,
+GArray *out)
 {
 uint32_t status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
 
@@ -475,6 +551,10 @@ static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
 return;
 }
 
+if (in->function == NVDIMM_DSM_FUNC_READ_FIT /* FIT Read */) {
+return nvdimm_dsm_func_read_fit(state, in, out);
+}
+
 nvdimm_debug("Return status %#x.\n", status);
 nvdimm_dsm_write_status(out, status);
 }
@@ -710,7 +790,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 
 /* Handle 0 is reserved for NVDIMM Root Device. */
 if (!in->handle) {
-nvdimm_dsm_root(in, out);
+nvdimm_dsm_root(state, in, out);
 goto exit;
 }
 
@@ -927,8 +1007,88 @@ static void 

[PATCH v7 35/35] nvdimm: add maintain info

2015-11-02 Thread Xiao Guangrong
Add NVDIMM maintainer

Signed-off-by: Xiao Guangrong 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3144113..865c0cf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -907,6 +907,13 @@ M: Jiri Pirko 
 S: Maintained
 F: hw/net/rocker/
 
+NVDIMM
+M: Xiao Guangrong 
+S: Maintained
+F: hw/acpi/nvdimm.c
+F: hw/mem/nvdimm.c
+F: include/hw/mem/nvdimm.h
+
 Subsystems
 --
 Audio
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-11-02 Thread Haozhong Zhang
The value of the migrated vcpu's TSC rate is determined as below.
 1. If a TSC rate is specified by the cpu option 'tsc-freq', then this
user-specified value will be used.
 2. If neither a user-specified TSC rate nor a migrated TSC rate is
present, we will use the TSC rate from KVM (returned by
KVM_GET_TSC_KHZ).
 3. Otherwise, we will use the migrated TSC rate.

Signed-off-by: Haozhong Zhang 
---
 include/sysemu/kvm.h |  2 ++
 kvm-all.c|  1 +
 target-arm/kvm.c |  5 +
 target-i386/kvm.c| 33 +
 target-mips/kvm.c|  5 +
 target-ppc/kvm.c |  5 +
 target-s390x/kvm.c   |  5 +
 7 files changed, 56 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 461ef65..0ec8b98 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -328,6 +328,8 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry 
*route,
 
 int kvm_arch_msi_data_to_gsi(uint32_t data);
 
+int kvm_arch_setup_tsc_khz(CPUState *cpu);
+
 int kvm_set_irq(KVMState *s, int irq, int level);
 int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
 
diff --git a/kvm-all.c b/kvm-all.c
index c442838..1ecaf04 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1757,6 +1757,7 @@ static void do_kvm_cpu_synchronize_post_init(void *arg)
 {
 CPUState *cpu = arg;
 
+kvm_arch_setup_tsc_khz(cpu);
 kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
 cpu->kvm_vcpu_dirty = false;
 }
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 79ef4c6..a724f6d 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -614,3 +614,8 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 return (data - 32) & 0x;
 }
+
+int kvm_arch_setup_tsc_khz(CPUState *cs)
+{
+return 0;
+}
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 64046cb..aae5e58 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3034,3 +3034,36 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 abort();
 }
+
+int kvm_arch_setup_tsc_khz(CPUState *cs)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+int r;
+
+/*
+ * Prepare vcpu's TSC rate to be migrated.
+ *
+ * - If the user specifies the TSC rate by cpu option 'tsc-freq',
+ *   we will use the user-specified value.
+ *
+ * - If there is neither user-specified TSC rate nor migrated TSC
+ *   rate, we will ask KVM for the TSC rate by calling
+ *   KVM_GET_TSC_KHZ.
+ *
+ * - Otherwise, if there is a migrated TSC rate, we will use the
+ *   migrated value.
+ */
+if (env->tsc_khz) {
+env->tsc_khz_saved = env->tsc_khz;
+} else if (!env->tsc_khz_saved) {
+r = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
+if (r < 0) {
+fprintf(stderr, "KVM_GET_TSC_KHZ failed\n");
+return r;
+}
+env->tsc_khz_saved = r;
+}
+
+return 0;
+}
diff --git a/target-mips/kvm.c b/target-mips/kvm.c
index 12d7db3..fb26d7e 100644
--- a/target-mips/kvm.c
+++ b/target-mips/kvm.c
@@ -687,3 +687,8 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 abort();
 }
+
+int kvm_arch_setup_tsc_khz(CPUState *cs)
+{
+return 0;
+}
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index ac70f08..c429f0c 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2510,3 +2510,8 @@ int kvmppc_enable_hwrng(void)
 
 return kvmppc_enable_hcall(kvm_state, H_RANDOM);
 }
+
+int kvm_arch_setup_tsc_khz(CPUState *cs)
+{
+return 0;
+}
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index c3be180..db5d436 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -2248,3 +2248,8 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 abort();
 }
+
+int kvm_arch_setup_tsc_khz(CPUState *cs)
+{
+return 0;
+}
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices

2015-11-02 Thread Xiao Guangrong
NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices

There is a root device under \_SB and specified NVDIMM devices are under the
root device. Each NVDIMM device has _ADR which returns its handle used to
associate MEMDEV structure in NFIT

We reserve handle 0 for root device. In this patch, we save handle, handle,
arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/nvdimm.c | 184 +++
 1 file changed, 184 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index dd84e5f..53ed675 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray 
*table_offsets,
 g_array_free(structures, true);
 }
 
+struct NvdimmDsmIn {
+uint32_t handle;
+uint32_t revision;
+uint32_t function;
+   /* the remaining size in the page is used by arg3. */
+uint8_t arg3[0];
+} QEMU_PACKED;
+typedef struct NvdimmDsmIn NvdimmDsmIn;
+
 static uint64_t
 nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 {
@@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 static void
 nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
 {
+fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
 }
 
 static const MemoryRegionOps nvdimm_dsm_ops = {
@@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, 
MemoryRegion *io,
 memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, >io_mr);
 }
 
+#define BUILD_STA_METHOD(_dev_, _method_)  \
+do {   \
+_method_ = aml_method("_STA", 0);  \
+aml_append(_method_, aml_return(aml_int(0x0f)));   \
+aml_append(_dev_, _method_);   \
+} while (0)
+
+#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)\
+do {   \
+Aml *ifctx, *uuid; \
+_method_ = aml_method("_DSM", 4);  \
+/* check UUID if it is we expect, return the errorcode if not.*/   \
+uuid = aml_touuid(_uuid_); \
+ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid))); \
+aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */))); \
+aml_append(method, ifctx); \
+aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
+   aml_arg(1), aml_arg(2), aml_arg(3;  \
+aml_append(_dev_, _method_);   \
+} while (0)
+
+#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_) \
+aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
+
+#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_) \
+BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
+
+static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
+{
+for (; device_list; device_list = device_list->next) {
+NVDIMMDevice *nvdimm = device_list->data;
+int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+   NULL);
+uint32_t handle = nvdimm_slot_to_handle(slot);
+Aml *dev, *method;
+
+dev = aml_device("NV%02X", slot);
+aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
+
+BUILD_STA_METHOD(dev, method);
+
+/*
+ * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
+ * in DSM Spec Rev1.
+ */
+BUILD_DSM_METHOD(dev, method,
+ handle /* NVDIMM Device Handle */,
+ "4309AC30-0D11-11E4-9191-0800200C9A66"
+ /* UUID for NVDIMM Devices. */);
+
+aml_append(root_dev, dev);
+}
+}
+
+static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
+{
+Aml *dev, *method, *field;
+uint64_t page_size = TARGET_PAGE_SIZE;
+
+dev = aml_device("NVDR");
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
+
+/* map DSM memory and IO into ACPI namespace. */
+aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
+   NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
+aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
+   NVDIMM_ACPI_MEM_BASE, page_size));
+
+/*
+ * DSM notifier:
+ * @NOTI: Read it will notify QEMU that _DSM method is being
+ *called and the parameters can be found in NvdimmDsmIn.
+ *The value read 

[PATCH v7 24/35] docs: add NVDIMM ACPI documentation

2015-11-02 Thread Xiao Guangrong
It describes the basic concepts of NVDIMM ACPI and the interface
between QEMU and the ACPI BIOS

Signed-off-by: Xiao Guangrong 
---
 docs/specs/acpi_nvdimm.txt | 179 +
 1 file changed, 179 insertions(+)
 create mode 100644 docs/specs/acpi_nvdimm.txt

diff --git a/docs/specs/acpi_nvdimm.txt b/docs/specs/acpi_nvdimm.txt
new file mode 100644
index 000..cc5db2c
--- /dev/null
+++ b/docs/specs/acpi_nvdimm.txt
@@ -0,0 +1,179 @@
+QEMU<->ACPI BIOS NVDIMM interface
+-
+
+QEMU supports NVDIMM via ACPI. This document describes the basic concepts of
+NVDIMM ACPI and the interface between QEMU and the ACPI BIOS.
+
+NVDIMM ACPI Background
+--
+NVDIMM is introduced in ACPI 6.0 which defines an NVDIMM root device under
+_SB scope with a _HID of “ACPI0012”. For each NVDIMM present or intended
+to be supported by platform, platform firmware also exposes an ACPI
+Namespace Device under the root device.
+
+The NVDIMM child devices under the NVDIMM root device are defined with _ADR
+corresponding to the NFIT device handle. The NVDIMM root device and the
+NVDIMM devices can have device specific methods (_DSM) to provide additional
+functions specific to a particular NVDIMM implementation.
+
+This is an example from ACPI 6.0, a platform contains one NVDIMM:
+
+Scope (\_SB){
+   Device (NVDR) // Root device
+   {
+  Name (_HID, “ACPI0012”)
+  Method (_STA) {...}
+  Method (_FIT) {...}
+  Method (_DSM, ...) {...}
+  Device (NVD)
+  {
+ Name(_ADR, h) //where h is NFIT Device Handle for this NVDIMM
+ Method (_DSM, ...) {...}
+  }
+   }
+}
+
+Methods supported on both NVDIMM root device and NVDIMM device are
+1) _STA(Status)
+   It returns the current status of a device, which can be one of the
+   following: enabled, disabled, or removed.
+
+   Arguments: None
+
+   Return Value:
+   It returns an An Integer which is defined as followings:
+   Bit [0] – Set if the device is present.
+   Bit [1] – Set if the device is enabled and decoding its resources.
+   Bit [2] – Set if the device should be shown in the UI.
+   Bit [3] – Set if the device is functioning properly (cleared if device
+ failed its diagnostics).
+   Bit [4] – Set if the battery is present.
+   Bits [31:5] – Reserved (must be cleared).
+
+2) _DSM (Device Specific Method)
+   It is a control method that enables devices to provide device specific
+   control functions that are consumed by the device driver.
+   The NVDIMM DSM specification can be found at:
+http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+
+   Arguments:
+   Arg0 – A Buffer containing a UUID (16 Bytes)
+   Arg1 – An Integer containing the Revision ID (4 Bytes)
+   Arg2 – An Integer containing the Function Index (4 Bytes)
+   Arg3 – A package containing parameters for the function specified by the
+  UUID, Revision ID, and Function Index
+
+   Return Value:
+   If Function Index = 0, a Buffer containing a function index bitfield.
+   Otherwise, the return value and type depends on the UUID, revision ID
+   and function index which are described in the DSM specification.
+
+Methods on NVDIMM ROOT Device
+_FIT(Firmware Interface Table)
+   It evaluates to a buffer returning data in the format of a series of NFIT
+   Type Structure.
+
+   Arguments: None
+
+   Return Value:
+   A Buffer containing a list of NFIT Type structure entries.
+
+   The detailed definition of the structure can be found at ACPI 6.0: 5.2.25
+   NVDIMM Firmware Interface Table (NFIT).
+
+QEMU NVDIMM Implemention
+
+QEMU reserves a page starting from 0xFF0 and 4 bytes IO Port starting
+from 0x0a18 for NVDIMM ACPI.
+
+Memory 0xFF0 - 0xFF00FFF:
+   This page is RAM-based and it is used to transfer data between _DSM
+   method and QEMU. If ACPI has control, this pages is owned by ACPI which
+   writes _DSM input data to it, otherwise, it is owned by QEMU which
+   emulates _DSM access and writes the output data to it.
+
+   ACPI Writes _DSM Input Data:
+   [0xFF0 - 0xFF3]: 4 bytes, NVDIMM Devcie Handle, 0 is reserved
+for NVDIMM Root device.
+   [0xFF4 - 0xFF7]: 4 bytes, Revision ID, that is the Arg1 of _DSM
+method.
+   [0xFF8 - 0xFFB]: 4 bytes. Function Index, that is the Arg2 of
+_DSM method.
+   [0xFFC - 0xFF00FFF]: 4084 bytes, the Arg3 of _DSM method
+
+   QEMU Writes Output Data:
+   [0xFF0 - 0xFF00FFF]: the DSM return result filled by QEMU
+
+IO Port 0x0a18 - 0xa1b:
+   ACPI uses it to transfer control from guest to QEMU and read the size
+   of return result filled by QEMU
+
+   Read Access:
+   [0x0a18 - 0xa1b]: 4 bytes, the buffer size of _DSM output data.
+
+_DSM process diagram:
+-
+The page, 0xFF0 - 0xFF00FFF, is used by _DSM Virtualization.
+

[PATCH v7 26/35] nvdimm acpi: build ACPI NFIT table

2015-11-02 Thread Xiao Guangrong
NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)

Currently, we only support PMEM mode. Each device has 3 structures:
- SPA structure, defines the PMEM region info

- MEM DEV structure, it has the @handle which is used to associate specified
  ACPI NVDIMM  device we will introduce in later patch.
  Also we can happily ignored the memory device's interleave, the real
  nvdimm hardware access is hidden behind host

- DCR structure, it defines vendor ID used to associate specified vendor
  nvdimm driver. Since we only implement PMEM mode this time, Command
  window and Data window are not needed

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/nvdimm.c| 355 
 hw/i386/acpi-build.c|   6 +
 include/hw/mem/nvdimm.h |  10 ++
 3 files changed, 371 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 1223da2..dd84e5f 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -26,8 +26,348 @@
  * License along with this library; if not, see 
  */
 
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
 #include "hw/mem/nvdimm.h"
 
+static int nvdimm_plugged_device_list(Object *obj, void *opaque)
+{
+GSList **list = opaque;
+
+if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
+NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+if (memory_region_is_mapped(>nvdimm_mr)) {
+*list = g_slist_append(*list, DEVICE(obj));
+}
+}
+
+object_child_foreach(obj, nvdimm_plugged_device_list, opaque);
+return 0;
+}
+
+/*
+ * inquire plugged NVDIMM devices and link them into the list which is
+ * returned to the caller.
+ *
+ * Note: it is the caller's responsibility to free the list to avoid
+ * memory leak.
+ */
+static GSList *nvdimm_get_plugged_device_list(void)
+{
+GSList *list = NULL;
+
+object_child_foreach(qdev_get_machine(), nvdimm_plugged_device_list,
+ );
+return list;
+}
+
+#define NVDIMM_UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7) \
+   { (a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, \
+ (b) & 0xff, ((b) >> 8) & 0xff, (c) & 0xff, ((c) >> 8) & 0xff,  \
+ (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) }
+/*
+ * define Byte Addressable Persistent Memory (PM) Region according to
+ * ACPI 6.0: 5.2.25.1 System Physical Address Range Structure.
+ */
+static const uint8_t nvdimm_nfit_spa_uuid[] =
+  NVDIMM_UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d, 0x33,
+ 0x18, 0xb7, 0x8c, 0xdb);
+
+/*
+ * NVDIMM Firmware Interface Table
+ * @signature: "NFIT"
+ *
+ * It provides information that allows OSPM to enumerate NVDIMM present in
+ * the platform and associate system physical address ranges created by the
+ * NVDIMMs.
+ *
+ * It is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
+ */
+struct NvdimmNfitHeader {
+ACPI_TABLE_HEADER_DEF
+uint32_t reserved;
+} QEMU_PACKED;
+typedef struct NvdimmNfitHeader NvdimmNfitHeader;
+
+/*
+ * define NFIT structures according to ACPI 6.0: 5.2.25 NVDIMM Firmware
+ * Interface Table (NFIT).
+ */
+
+/*
+ * System Physical Address Range Structure
+ *
+ * It describes the system physical address ranges occupied by NVDIMMs and
+ * the types of the regions.
+ */
+struct NvdimmNfitSpa {
+uint16_t type;
+uint16_t length;
+uint16_t spa_index;
+uint16_t flags;
+uint32_t reserved;
+uint32_t proximity_domain;
+uint8_t type_guid[16];
+uint64_t spa_base;
+uint64_t spa_length;
+uint64_t mem_attr;
+} QEMU_PACKED;
+typedef struct NvdimmNfitSpa NvdimmNfitSpa;
+
+/*
+ * Memory Device to System Physical Address Range Mapping Structure
+ *
+ * It enables identifying each NVDIMM region and the corresponding SPA
+ * describing the memory interleave
+ */
+struct NvdimmNfitMemDev {
+uint16_t type;
+uint16_t length;
+uint32_t nfit_handle;
+uint16_t phys_id;
+uint16_t region_id;
+uint16_t spa_index;
+uint16_t dcr_index;
+uint64_t region_len;
+uint64_t region_offset;
+uint64_t region_dpa;
+uint16_t interleave_index;
+uint16_t interleave_ways;
+uint16_t flags;
+uint16_t reserved;
+} QEMU_PACKED;
+typedef struct NvdimmNfitMemDev NvdimmNfitMemDev;
+
+/*
+ * NVDIMM Control Region Structure
+ *
+ * It describes the NVDIMM and if applicable, Block Control Window.
+ */
+struct NvdimmNfitControlRegion {
+uint16_t type;
+uint16_t length;
+uint16_t dcr_index;
+uint16_t vendor_id;
+uint16_t device_id;
+uint16_t revision_id;
+uint16_t sub_vendor_id;
+uint16_t sub_device_id;
+uint16_t sub_revision_id;
+uint8_t reserved[6];
+uint32_t serial_number;
+uint16_t fic;
+uint16_t num_bcw;
+uint64_t bcw_size;
+uint64_t cmd_offset;
+uint64_t cmd_size;
+uint64_t status_offset;
+uint64_t status_size;
+uint16_t flags;
+uint8_t 

[PATCH v7 28/35] nvdimm acpi: save arg3 for NVDIMM device _DSM method

2015-11-02 Thread Xiao Guangrong
Check if the input Arg3 is valid then store it into dsm_in if needed

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/nvdimm.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 53ed675..e179a72 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -524,13 +524,38 @@ static void nvdimm_build_acpi_devices(GSList 
*device_list, Aml *sb_scope)
 
 method = aml_method_serialized("NCAL", 4);
 {
-Aml *buffer_size = aml_local(0);
+Aml *ifctx, *pckg, *buffer_size = aml_local(0);
 
 aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
 aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
 aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
 
 /*
+ * The fourth parameter (Arg3) of _DSM is a package which contains
+ * a buffer, the layout of the buffer is specified by UUID (Arg0),
+ * Revision ID (Arg1) and Function Index (Arg2) which are documented
+ * in the DSM Spec.
+ */
+pckg = aml_arg(3);
+ifctx = aml_if(aml_and(aml_equal(aml_object_type(pckg),
+ aml_int(4 /* Package */)),
+   aml_equal(aml_sizeof(pckg),
+ aml_int(1;
+{
+Aml *pckg_index, *pckg_buf;
+
+pckg_index = aml_local(2);
+pckg_buf = aml_local(3);
+
+aml_append(ifctx, aml_store(aml_index(pckg, aml_int(0)),
+pckg_index));
+aml_append(ifctx, aml_store(aml_derefof(pckg_index),
+pckg_buf));
+aml_append(ifctx, aml_store(pckg_buf, aml_name("ARG3")));
+}
+aml_append(method, ifctx);
+
+/*
  * transfer control to QEMU and the buffer size filled by
  * QEMU is returned.
  */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 30/35] nvdimm acpi: support Get Namespace Label Size function

2015-11-02 Thread Xiao Guangrong
Function 4 is used to get Namespace label size

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Xiao Guangrong 
---
 hw/acpi/nvdimm.c | 87 +++-
 1 file changed, 86 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 3895ad8..cdc361c 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -407,15 +407,48 @@ enum {
 NVDIMM_DSM_DEV_STATUS_VENDOR_SPECIFIC_ERROR = 4,
 };
 
+struct NvdimmFuncInGetLabelData {
+uint32_t offset; /* the offset in the namespace label data area. */
+uint32_t length; /* the size of data is to be read via the function. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncInGetLabelData NvdimmFuncInGetLabelData;
+
+struct NvdimmFuncInSetLabelData {
+uint32_t offset; /* the offset in the namespace label data area. */
+uint32_t length; /* the size of data is to be written via the function. */
+uint8_t in_buf[0]; /* the data written to label data area. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncInSetLabelData NvdimmFuncInSetLabelData;
+
 struct NvdimmDsmIn {
 uint32_t handle;
 uint32_t revision;
 uint32_t function;
/* the remaining size in the page is used by arg3. */
-uint8_t arg3[0];
+union {
+uint8_t arg3[0];
+NvdimmFuncInSetLabelData func_set_label_data;
+};
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
 
+struct NvdimmFuncOutLabelSize {
+uint32_t status; /* return status code. */
+uint32_t label_size; /* the size of label data area. */
+/*
+ * Maximum size of the namespace label data length supported by
+ * the platform in Get/Set Namespace Label Data functions.
+ */
+uint32_t max_xfer;
+} QEMU_PACKED;
+typedef struct NvdimmFuncOutLabelSize NvdimmFuncOutLabelSize;
+
+struct NvdimmFuncOutGetLabelData {
+uint32_t status;/*return status code. */
+uint8_t out_buf[0]; /* the data got via Get Namesapce Label function. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncOutGetLabelData NvdimmFuncOutGetLabelData;
+
 static void nvdimm_dsm_write_status(GArray *out, uint32_t status)
 {
 status = cpu_to_le32(status);
@@ -445,6 +478,55 @@ static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
 nvdimm_dsm_write_status(out, status);
 }
 
+/*
+ * the max transfer size is the max size transferred by both a
+ * 'Get Namespace Label Data' function and a 'Set Namespace Label Data'
+ * function.
+ */
+static uint32_t nvdimm_get_max_xfer_label_size(void)
+{
+NvdimmDsmIn *in;
+uint32_t max_get_size, max_set_size, dsm_memory_size = TARGET_PAGE_SIZE;
+
+/*
+ * the max data ACPI can read one time which is transferred by
+ * the response of 'Get Namespace Label Data' function.
+ */
+max_get_size = dsm_memory_size - sizeof(NvdimmFuncOutGetLabelData);
+
+/*
+ * the max data ACPI can write one time which is transferred by
+ * 'Set Namespace Label Data' function.
+ */
+max_set_size = dsm_memory_size - offsetof(NvdimmDsmIn, arg3) -
+   sizeof(in->func_set_label_data);
+
+return MIN(max_get_size, max_set_size);
+}
+
+/*
+ * DSM Spec Rev1 4.4 Get Namespace Label Size (Function Index 4).
+ *
+ * It gets the size of Namespace Label data area and the max data size
+ * that Get/Set Namespace Label Data functions can transfer.
+ */
+static void nvdimm_dsm_func_label_size(NVDIMMDevice *nvdimm, GArray *out)
+{
+NvdimmFuncOutLabelSize func_label_size;
+uint32_t label_size, mxfer;
+
+label_size = nvdimm->label_size;
+mxfer = nvdimm_get_max_xfer_label_size();
+
+nvdimm_debug("label_size %#x, max_xfer %#x.\n", label_size, mxfer);
+
+func_label_size.status = cpu_to_le32(NVDIMM_DSM_STATUS_SUCCESS);
+func_label_size.label_size = cpu_to_le32(label_size);
+func_label_size.max_xfer = cpu_to_le32(mxfer);
+
+g_array_append_vals(out, _label_size, sizeof(func_label_size));
+}
+
 static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 {
 GSList *list = nvdimm_get_plugged_device_list();
@@ -469,6 +551,9 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
1 << 6 /* Set Namespace Label Data */);
 build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
 goto free;
+case 0x4 /* Get Namespace Label Size */:
+nvdimm_dsm_func_label_size(nvdimm, out);
+goto free;
 default:
 status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
 };
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 29/35] nvdimm acpi: support function 0

2015-11-02 Thread Xiao Guangrong
__DSM is defined in ACPI 6.0: 9.14.1 _DSM (Device Specific Method)

Function 0 is a query function. We do not support any function on root
device and only 3 functions are support for NVDIMM device, Get Namespace
Label Size, Get Namespace Label Data and Set Namespace Label Data, that
means we currently only allow to access device's Label Namespace

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c |   2 +-
 hw/acpi/nvdimm.c| 158 +++-
 include/hw/acpi/aml-build.h |   1 +
 3 files changed, 159 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 8bee8b2..90229c5 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -231,7 +231,7 @@ static void build_extop_package(GArray *package, uint8_t op)
 build_prepend_byte(package, 0x5B); /* ExtOpPrefix */
 }
 
-static void build_append_int_noprefix(GArray *table, uint64_t value, int size)
+void build_append_int_noprefix(GArray *table, uint64_t value, int size)
 {
 int i;
 
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index e179a72..3895ad8 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -212,6 +212,22 @@ static uint32_t nvdimm_slot_to_dcr_index(int slot)
 return nvdimm_slot_to_spa_index(slot) + 1;
 }
 
+static NVDIMMDevice
+*nvdimm_get_device_by_handle(GSList *list, uint32_t handle)
+{
+for (; list; list = list->next) {
+NVDIMMDevice *nvdimm = list->data;
+int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+   NULL);
+
+if (nvdimm_slot_to_handle(slot) == handle) {
+return nvdimm;
+}
+}
+
+return NULL;
+}
+
 /* ACPI 6.0: 5.2.25.1 System Physical Address Range Structure */
 static void
 nvdimm_build_structure_spa(GArray *structures, NVDIMMDevice *nvdimm)
@@ -368,6 +384,29 @@ static void nvdimm_build_nfit(GSList *device_list, GArray 
*table_offsets,
 g_array_free(structures, true);
 }
 
+/* define NVDIMM DSM return status codes according to DSM Spec Rev1. */
+enum {
+/* Common return status codes. */
+/* Success */
+NVDIMM_DSM_STATUS_SUCCESS = 0,
+/* Not Supported */
+NVDIMM_DSM_STATUS_NOT_SUPPORTED = 1,
+
+/* NVDIMM Root Device _DSM function return status codes*/
+/* Invalid Input Parameters */
+NVDIMM_DSM_ROOT_DEV_STATUS_INVALID_PARAS = 2,
+/* Function-Specific Error */
+NVDIMM_DSM_ROOT_DEV_STATUS_FUNCTION_SPECIFIC_ERROR = 3,
+
+/* NVDIMM Device (non-root) _DSM function return status codes*/
+/* Non-Existing Memory Device */
+NVDIMM_DSM_DEV_STATUS_NON_EXISTING_MEM_DEV = 2,
+/* Invalid Input Parameters */
+NVDIMM_DSM_DEV_STATUS_INVALID_PARAS = 3,
+/* Vendor Specific Error */
+NVDIMM_DSM_DEV_STATUS_VENDOR_SPECIFIC_ERROR = 4,
+};
+
 struct NvdimmDsmIn {
 uint32_t handle;
 uint32_t revision;
@@ -377,10 +416,127 @@ struct NvdimmDsmIn {
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
 
+static void nvdimm_dsm_write_status(GArray *out, uint32_t status)
+{
+status = cpu_to_le32(status);
+build_append_int_noprefix(out, status, sizeof(status));
+}
+
+static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
+{
+uint32_t status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
+
+/*
+ * Query command implemented per ACPI Specification, it is defined in
+ * ACPI 6.0: 9.14.1 _DSM (Device Specific Method).
+ */
+if (in->function == 0x0) {
+/*
+ * Set it to zero to indicate no function is supported for NVDIMM
+ * root.
+ */
+uint64_t cmd_list = cpu_to_le64(0);
+
+build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
+return;
+}
+
+nvdimm_debug("Return status %#x.\n", status);
+nvdimm_dsm_write_status(out, status);
+}
+
+static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
+{
+GSList *list = nvdimm_get_plugged_device_list();
+NVDIMMDevice *nvdimm = nvdimm_get_device_by_handle(list, in->handle);
+uint32_t status = NVDIMM_DSM_DEV_STATUS_NON_EXISTING_MEM_DEV;
+uint64_t cmd_list;
+
+if (!nvdimm) {
+goto set_status_free;
+}
+
+/* Encode DSM function according to DSM Spec Rev1. */
+switch (in->function) {
+/* see comments in nvdimm_dsm_root(). */
+case 0x0:
+cmd_list = cpu_to_le64(0x1 /* Bit 0 indicates whether there is
+  support for any functions other
+  than function 0.
+*/   |
+   1 << 4 /* Get Namespace Label Size */ |
+   1 << 5 /* Get Namespace Label Data */ |
+   1 << 6 /* Set Namespace Label Data */);
+build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
+goto free;
+default:

[PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI

2015-11-02 Thread Xiao Guangrong
A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are
reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
for detailed design

A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
that controls if nvdimm support is enabled, it is true on default and
it is false on 2.4 and its earlier version to keep compatibility

Signed-off-by: Xiao Guangrong 
---
 default-configs/i386-softmmu.mak |  1 +
 default-configs/mips-softmmu.mak |  1 +
 default-configs/mips64-softmmu.mak   |  1 +
 default-configs/mips64el-softmmu.mak |  1 +
 default-configs/mipsel-softmmu.mak   |  1 +
 default-configs/x86_64-softmmu.mak   |  1 +
 hw/acpi/Makefile.objs|  1 +
 hw/acpi/ich9.c   | 24 ++
 hw/acpi/nvdimm.c | 63 
 hw/acpi/piix4.c  | 27 
 include/hw/acpi/ich9.h   |  3 ++
 include/hw/i386/pc.h | 10 ++
 include/hw/mem/nvdimm.h  | 34 +++
 13 files changed, 161 insertions(+), 7 deletions(-)
 create mode 100644 hw/acpi/nvdimm.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 4e84a1c..51e71d4 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -48,6 +48,7 @@ CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak
index 44467c3..6b8b70e 100644
--- a/default-configs/mips-softmmu.mak
+++ b/default-configs/mips-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/mips64-softmmu.mak 
b/default-configs/mips64-softmmu.mak
index 66ed5f9..ea820f6 100644
--- a/default-configs/mips64-softmmu.mak
+++ b/default-configs/mips64-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/mips64el-softmmu.mak 
b/default-configs/mips64el-softmmu.mak
index bfca2b2..8993851 100644
--- a/default-configs/mips64el-softmmu.mak
+++ b/default-configs/mips64el-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/mipsel-softmmu.mak 
b/default-configs/mipsel-softmmu.mak
index 0162ef0..87ab964 100644
--- a/default-configs/mipsel-softmmu.mak
+++ b/default-configs/mipsel-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index e877a86..0a7dc10 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -48,6 +48,7 @@ CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 7d3230c..84c082d 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -2,6 +2,7 @@ common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o pcihp.o
 common-obj-$(CONFIG_ACPI_X86_ICH) += ich9.o tco.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
+obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI) += acpi_interface.o
 common-obj-$(CONFIG_ACPI) += bios-linker-loader.o
 common-obj-$(CONFIG_ACPI) += aml-build.o
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 1e9ae20..603c1bd 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -280,6 +280,12 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
 acpi_memory_hotplug_init(pci_address_space_io(lpc_pci), 
OBJECT(lpc_pci),
  >acpi_memory_hotplug);
 }
+
+if (pm->acpi_nvdimm_state.is_enabled) {
+nvdimm_init_acpi_state(pci_address_space(lpc_pci),
+   pci_address_space_io(lpc_pci), OBJECT(lpc_pci),
+   >acpi_nvdimm_state);
+}
 }
 
 static void ich9_pm_get_gpe0_blk(Object *obj, Visitor *v,
@@ -307,6 +313,20 @@ static void ich9_pm_set_memory_hotplug_support(Object 
*obj, bool value,
 s->pm.acpi_memory_hotplug.is_enabled = value;
 }
 
+static bool ich9_pm_get_nvdimm_support(Object *obj, Error **errp)
+{
+ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+
+return s->pm.acpi_nvdimm_state.is_enabled;
+}
+
+static void ich9_pm_set_nvdimm_support(Object 

[PATCH v7 19/35] dimm: abstract dimm device from pc-dimm

2015-11-02 Thread Xiao Guangrong
A base device, dimm, is abstracted from pc-dimm, so that we can
build nvdimm device based on dimm in the later patch

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Xiao Guangrong 
---
 default-configs/i386-softmmu.mak   |  1 +
 default-configs/ppc64-softmmu.mak  |  1 +
 default-configs/x86_64-softmmu.mak |  1 +
 hw/mem/Makefile.objs   |  3 ++-
 hw/mem/dimm.c  | 11 ++---
 hw/mem/pc-dimm.c   | 46 ++
 include/hw/mem/dimm.h  |  4 ++--
 include/hw/mem/pc-dimm.h   |  7 ++
 8 files changed, 62 insertions(+), 12 deletions(-)
 create mode 100644 hw/mem/pc-dimm.c
 create mode 100644 include/hw/mem/pc-dimm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 43c96d1..3ece8bb 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -18,6 +18,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_X86_ICH=y
+CONFIG_DIMM=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
diff --git a/default-configs/ppc64-softmmu.mak 
b/default-configs/ppc64-softmmu.mak
index bb71b23..482b8a1 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -54,3 +54,4 @@ CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_MC146818RTC=y
 CONFIG_ISA_TESTDEV=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_DIMM=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index dfb8095..92ea7c1 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -18,6 +18,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_X86_ICH=y
+CONFIG_DIMM=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index 7563ef5..cebb4b1 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1,2 @@
-common-obj-$(CONFIG_MEM_HOTPLUG) += dimm.o
+common-obj-$(CONFIG_DIMM) += dimm.o
+common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 9f55cee..4a63409 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -1,5 +1,5 @@
 /*
- * Dimm device for Memory Hotplug
+ * Dimm device abstraction
  *
  * Copyright ProfitBricks GmbH 2012
  * Copyright (C) 2014 Red Hat Inc
@@ -429,21 +429,13 @@ static void dimm_realize(DeviceState *dev, Error **errp)
 }
 }
 
-static MemoryRegion *dimm_get_memory_region(DIMMDevice *dimm)
-{
-return host_memory_backend_get_memory(dimm->hostmem, _abort);
-}
-
 static void dimm_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
-DIMMDeviceClass *ddc = DIMM_CLASS(oc);
 
 dc->realize = dimm_realize;
 dc->props = dimm_properties;
 dc->desc = "DIMM memory module";
-
-ddc->get_memory_region = dimm_get_memory_region;
 }
 
 static TypeInfo dimm_info = {
@@ -453,6 +445,7 @@ static TypeInfo dimm_info = {
 .instance_init = dimm_init,
 .class_init= dimm_class_init,
 .class_size= sizeof(DIMMDeviceClass),
+.abstract  = true,
 };
 
 static void dimm_register_types(void)
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
new file mode 100644
index 000..38323e9
--- /dev/null
+++ b/hw/mem/pc-dimm.c
@@ -0,0 +1,46 @@
+/*
+ * Dimm device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * Copyright (C) 2014 Red Hat Inc
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "hw/mem/pc-dimm.h"
+
+static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
+{
+return host_memory_backend_get_memory(dimm->hostmem, _abort);
+}
+
+static void pc_dimm_class_init(ObjectClass *oc, void *data)
+{
+DIMMDeviceClass *ddc = DIMM_CLASS(oc);
+
+ddc->get_memory_region = pc_dimm_get_memory_region;
+}
+
+static TypeInfo pc_dimm_info = {
+.name  = TYPE_PC_DIMM,
+.parent= TYPE_DIMM,
+.class_init= pc_dimm_class_init,
+};
+
+static void pc_dimm_register_types(void)
+{
+type_register_static(_dimm_info);
+}
+
+type_init(pc_dimm_register_types)
diff --git a/include/hw/mem/dimm.h b/include/hw/mem/dimm.h
index ece8786..50f768a 100644
--- a/include/hw/mem/dimm.h
+++ b/include/hw/mem/dimm.h
@@ -1,5 +1,5 @@
 /*
- * PC DIMM device
+ * 

[PATCH v7 21/35] dimm: keep the state of the whole backend memory

2015-11-02 Thread Xiao Guangrong
QEMU keeps the state of memory of dimm device during live migration,
however, it is not enough for nvdimm device as its memory does not
contain its label data, so that we should protect the whole backend
memory instead

Signed-off-by: Xiao Guangrong 
---
 hw/mem/dimm.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 498d380..7d1 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -134,9 +134,16 @@ void dimm_memory_plug(DeviceState *dev, MemoryHotplugState 
*hpms,
 }
 
 memory_region_add_subregion(>mr, addr - hpms->base, mr);
-vmstate_register_ram(mr, dev);
 numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
 
+/*
+ * save the state only for @mr is not enough as it does not contain
+ * the label data of NVDIMM device, so that we keep the state of
+ * whole hostmem instead.
+ */
+vmstate_register_ram(host_memory_backend_get_memory(dimm->hostmem, errp),
+ dev);
+
 out:
 error_propagate(errp, local_err);
 }
@@ -145,10 +152,13 @@ void dimm_memory_unplug(DeviceState *dev, 
MemoryHotplugState *hpms,
MemoryRegion *mr)
 {
 DIMMDevice *dimm = DIMM(dev);
+MemoryRegion *backend_mr;
+
+backend_mr = host_memory_backend_get_memory(dimm->hostmem, _abort);
 
 numa_unset_mem_node_id(dimm->addr, memory_region_size(mr), dimm->node);
 memory_region_del_subregion(>mr, mr);
-vmstate_unregister_ram(mr, dev);
+vmstate_unregister_ram(backend_mr, dev);
 }
 
 int qmp_dimm_device_list(Object *obj, void *opaque)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 18/35] pc-dimm: rename pc-dimm.c and pc-dimm.h

2015-11-02 Thread Xiao Guangrong
Rename:
   pc-dimm.c => dimm.c
   pc-dimm.h => dimm.h

It prepares the work which abstracts dimm device type for both pc-dimm and
nvdimm

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Xiao Guangrong 
---
 hw/Makefile.objs | 2 +-
 hw/acpi/ich9.c   | 2 +-
 hw/acpi/memory_hotplug.c | 4 ++--
 hw/acpi/piix4.c  | 2 +-
 hw/i386/pc.c | 2 +-
 hw/mem/Makefile.objs | 2 +-
 hw/mem/{pc-dimm.c => dimm.c} | 2 +-
 hw/ppc/spapr.c   | 2 +-
 include/hw/i386/pc.h | 2 +-
 include/hw/mem/{pc-dimm.h => dimm.h} | 0
 include/hw/ppc/spapr.h   | 2 +-
 numa.c   | 2 +-
 qmp.c| 2 +-
 stubs/qmp_dimm_device_list.c | 2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)
 rename hw/mem/{pc-dimm.c => dimm.c} (99%)
 rename include/hw/mem/{pc-dimm.h => dimm.h} (100%)

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 7e7c241..12ecda9 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -30,8 +30,8 @@ devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
-devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem/
 devices-dirs-$(CONFIG_SMBIOS) += smbios/
+devices-dirs-y += mem/
 devices-dirs-y += core/
 common-obj-y += $(devices-dirs-y)
 obj-y += $(devices-dirs-y)
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index b0d6a67..1e9ae20 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -35,7 +35,7 @@
 #include "exec/address-spaces.h"
 
 #include "hw/i386/ich9.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 
 //#define DEBUG
 
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index e687852..20d3093 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -1,6 +1,6 @@
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/pc-hotplug.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "hw/boards.h"
 #include "hw/qdev-core.h"
 #include "trace.h"
@@ -148,7 +148,7 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr 
addr, uint64_t data,
 
 dev = DEVICE(mdev->dimm);
 hotplug_ctrl = qdev_get_hotplug_handler(dev);
-/* call pc-dimm unplug cb */
+/* call dimm unplug cb */
 hotplug_handler_unplug(hotplug_ctrl, dev, _err);
 if (local_err) {
 trace_mhp_acpi_dimm_delete_failed(mem_st->selector);
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 0b2cb6e..b2f5b2c 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -33,7 +33,7 @@
 #include "hw/acpi/pcihp.h"
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/hotplug.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/acpi_dev_interface.h"
 #include "hw/xen/xen.h"
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 67ecc4f..6bf569a 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -62,7 +62,7 @@
 #include "hw/boards.h"
 #include "hw/pci/pci_host.h"
 #include "acpi-build.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "qapi/visitor.h"
 #include "qapi-visit.h"
 
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index b000fb4..7563ef5 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1 @@
-common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_MEM_HOTPLUG) += dimm.o
diff --git a/hw/mem/pc-dimm.c b/hw/mem/dimm.c
similarity index 99%
rename from hw/mem/pc-dimm.c
rename to hw/mem/dimm.c
index 67afc53..9f55cee 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/dimm.c
@@ -18,7 +18,7 @@
  * License along with this library; if not, see 
  */
 
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "qemu/config-file.h"
 #include "qapi/visitor.h"
 #include "qemu/range.h"
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ab6eb83..9ff24fd 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2199,7 +2199,7 @@ static void spapr_machine_device_plug(HotplugHandler 
*hotplug_dev,
  *
  * - Memory gets hotplugged to a different node than what the user
  *   specified.
- * - Since pc-dimm subsystem in QEMU still thinks that memory belongs
+ * - Since dimm subsystem in QEMU still thinks that memory belongs
  *   to memory-less node, a reboot will set things accordingly
  *   and the previously hotplugged memory now ends in the right node.
  *   This appears as if some memory moved from one node to another.
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 606dbc2..62e8fb5 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -16,7 +16,7 @@
 #include "hw/pci/pci.h"
 #include "hw/boards.h"
 #include "hw/compat.h"
-#include "hw/mem/pc-dimm.h"

[PATCH v7 22/35] dimm: introduce realize callback

2015-11-02 Thread Xiao Guangrong
nvdimm need check if the backend memory is large enough to contain label
data and init its memory region when the device is realized, so introduce
realize callback which is called after common dimm has been realize

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Xiao Guangrong 
---
 hw/mem/dimm.c | 5 +
 include/hw/mem/dimm.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 7d1..0ae23ce 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -426,6 +426,7 @@ static void dimm_init(Object *obj)
 static void dimm_realize(DeviceState *dev, Error **errp)
 {
 DIMMDevice *dimm = DIMM(dev);
+DIMMDeviceClass *ddc = DIMM_GET_CLASS(dimm);
 
 if (!dimm->hostmem) {
 error_setg(errp, "'" DIMM_MEMDEV_PROP "' property is not set");
@@ -438,6 +439,10 @@ static void dimm_realize(DeviceState *dev, Error **errp)
dimm->node, nb_numa_nodes ? nb_numa_nodes : 1);
 return;
 }
+
+if (ddc->realize) {
+ddc->realize(dimm, errp);
+}
 }
 
 static void dimm_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/mem/dimm.h b/include/hw/mem/dimm.h
index 50f768a..72ec24c 100644
--- a/include/hw/mem/dimm.h
+++ b/include/hw/mem/dimm.h
@@ -65,6 +65,7 @@ typedef struct DIMMDeviceClass {
 DeviceClass parent_class;
 
 /* public */
+void (*realize)(DIMMDevice *dimm, Error **errp);
 MemoryRegion *(*get_memory_region)(DIMMDevice *dimm);
 } DIMMDeviceClass;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 23/35] nvdimm: implement NVDIMM device abstract

2015-11-02 Thread Xiao Guangrong
Introduce "nvdimm" device which is based on dimm device type

128K memory region which is the minimum namespace label size
required by NVDIMM Namespace Spec locates at the end of
backend memory device is reserved for label data

We can use "-m 1G,maxmem=100G,slots=10 -object memory-backend-file,
id=mem1,size=1G,mem-path=/dev/pmem0 -device nvdimm,memdev=mem1" to
create NVDIMM device for guest

Signed-off-by: Xiao Guangrong 
---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/acpi/memory_hotplug.c   |   6 ++
 hw/mem/Makefile.objs   |   1 +
 hw/mem/nvdimm.c| 116 +
 include/hw/mem/nvdimm.h|  83 ++
 6 files changed, 208 insertions(+)
 create mode 100644 hw/mem/nvdimm.c
 create mode 100644 include/hw/mem/nvdimm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 3ece8bb..4e84a1c 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -47,6 +47,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 92ea7c1..e877a86 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -47,6 +47,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index 20d3093..bb5a29f 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -1,6 +1,7 @@
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/pc-hotplug.h"
 #include "hw/mem/dimm.h"
+#include "hw/mem/nvdimm.h"
 #include "hw/boards.h"
 #include "hw/qdev-core.h"
 #include "trace.h"
@@ -231,6 +232,11 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, 
MemHotplugState *mem_st,
 {
 MemStatus *mdev;
 
+/* Currently, NVDIMM hotplug has not been supported yet. */
+if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
+return;
+}
+
 mdev = acpi_memory_slot_status(mem_st, dev, errp);
 if (!mdev) {
 return;
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index cebb4b1..12d9b72 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1,2 +1,3 @@
 common-obj-$(CONFIG_DIMM) += dimm.o
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm.o
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
new file mode 100644
index 000..c310887
--- /dev/null
+++ b/hw/mem/nvdimm.c
@@ -0,0 +1,116 @@
+/*
+ * Non-Volatile Dual In-line Memory Module Virtualization Implementation
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong 
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qapi/visitor.h"
+#include "hw/mem/nvdimm.h"
+
+static MemoryRegion *nvdimm_get_memory_region(DIMMDevice *dimm)
+{
+NVDIMMDevice *nvdimm = NVDIMM(dimm);
+
+/* plug a NVDIMM device which is not properly realized? */
+assert(memory_region_size(>nvdimm_mr));
+
+return >nvdimm_mr;
+}
+
+static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
+{
+MemoryRegion *mr;
+NVDIMMDevice *nvdimm = NVDIMM(dimm);
+uint64_t size;
+
+nvdimm->label_size = MIN_NAMESPACE_LABEL_SIZE;
+
+mr = host_memory_backend_get_memory(dimm->hostmem, errp);
+size = memory_region_size(mr);
+
+if (size <= nvdimm->label_size) {
+char *path = 
object_get_canonical_path_component(OBJECT(dimm->hostmem));
+error_setg(errp, "the size of memdev %s (0x%" PRIx64 ") is too small"
+   " to contain nvdimm namespace label (0x%" PRIx64 ")", path,
+   memory_region_size(mr), nvdimm->label_size);
+return;
+}
+
+memory_region_init_alias(>nvdimm_mr, OBJECT(dimm), "nvdimm-memory",
+ mr, 0, size - nvdimm->label_size);
+nvdimm->label_data = memory_region_get_ram_ptr(mr) +
+ memory_region_size(>nvdimm_mr);
+}
+
+static void 

[PATCH v7 12/35] util: let qemu_fd_getlength support block device

2015-11-02 Thread Xiao Guangrong
lseek can not work for all block devices as the man page says:
| Some devices are incapable of seeking and POSIX does not specify
| which devices must support lseek().

This patch tries to add the support on Linux by using BLKGETSIZE64
ioctl

Signed-off-by: Xiao Guangrong 
---
 util/osdep.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/util/osdep.c b/util/osdep.c
index 5a61e19..b20c793 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -45,6 +45,11 @@
 extern int madvise(caddr_t, size_t, int);
 #endif
 
+#ifdef CONFIG_LINUX
+#include 
+#include 
+#endif
+
 #include "qemu-common.h"
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
@@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
 {
 int64_t size;
 
+#ifdef CONFIG_LINUX
+struct stat stat_buf;
+if (fstat(fd, _buf) < 0) {
+return -errno;
+}
+
+if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, )) {
+/* The size of block device is larger than max int64_t? */
+if (size < 0) {
+return -EOVERFLOW;
+}
+return size;
+}
+#endif
+
 size = lseek(fd, 0, SEEK_END);
 if (size < 0) {
 return -errno;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 17/35] stubs: rename qmp_pc_dimm_device_list.c

2015-11-02 Thread Xiao Guangrong
Rename qmp_pc_dimm_device_list.c to qmp_dimm_device_list.c

Signed-off-by: Xiao Guangrong 
---
 stubs/Makefile.objs | 2 +-
 stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} (100%)

diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 251443b..d5c862a 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -32,6 +32,6 @@ stub-obj-y += vmstate.o
 stub-obj-$(CONFIG_WIN32) += fd-register.o
 stub-obj-y += cpus.o
 stub-obj-y += kvm.o
-stub-obj-y += qmp_pc_dimm_device_list.o
+stub-obj-y += qmp_dimm_device_list.o
 stub-obj-y += target-monitor-defs.o
 stub-obj-y += vhost.o
diff --git a/stubs/qmp_pc_dimm_device_list.c b/stubs/qmp_dimm_device_list.c
similarity index 100%
rename from stubs/qmp_pc_dimm_device_list.c
rename to stubs/qmp_dimm_device_list.c
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 11/35] util: introduce qemu_file_getlength()

2015-11-02 Thread Xiao Guangrong
It is used to get the size of the specified file, also qemu_fd_getlength()
is introduced to unify the code with raw_getlength() in block/raw-posix.c

Signed-off-by: Xiao Guangrong 
---
 block/raw-posix.c|  7 +--
 include/qemu/osdep.h |  2 ++
 util/osdep.c | 31 +++
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 918c756..734e6dd 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1592,18 +1592,13 @@ static int64_t raw_getlength(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
 int ret;
-int64_t size;
 
 ret = fd_open(bs);
 if (ret < 0) {
 return ret;
 }
 
-size = lseek(s->fd, 0, SEEK_END);
-if (size < 0) {
-return -errno;
-}
-return size;
+return qemu_fd_getlength(s->fd);
 }
 #endif
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index dbc17dc..ca4c3fa 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -303,4 +303,6 @@ int qemu_read_password(char *buf, int buf_size);
 pid_t qemu_fork(Error **errp);
 
 size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
+int64_t qemu_fd_getlength(int fd);
+size_t qemu_file_getlength(const char *file, Error **errp);
 #endif
diff --git a/util/osdep.c b/util/osdep.c
index 0092bb6..5a61e19 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -428,3 +428,34 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
 return readv_writev(fd, iov, iov_cnt, true);
 }
 #endif
+
+int64_t qemu_fd_getlength(int fd)
+{
+int64_t size;
+
+size = lseek(fd, 0, SEEK_END);
+if (size < 0) {
+return -errno;
+}
+return size;
+}
+
+size_t qemu_file_getlength(const char *file, Error **errp)
+{
+int64_t size;
+int fd = qemu_open(file, O_RDONLY);
+
+if (fd < 0) {
+error_setg_file_open(errp, errno, file);
+return 0;
+}
+
+size = qemu_fd_getlength(fd);
+if (size < 0) {
+error_setg_errno(errp, -size, "can't get size of file %s", file);
+size = 0;
+}
+
+qemu_close(fd);
+return size;
+}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM: x86: simplify RSM into 64-bit protected mode

2015-11-02 Thread Paolo Bonzini


On 31/10/2015 20:50, Laszlo Ersek wrote:
> Tested-by: Laszlo Ersek 

Thanks Laszlo, I applied patches 1 and 2 (since your "part 2" never was :)).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails

2015-11-02 Thread Thomas Huth
On 27/10/15 06:13, Paul Mackerras wrote:
> When handling a hypervisor data or instruction storage interrupt (HDSI
> or HISI), we look up the SLB entry for the address being accessed in
> order to translate the effective address to a virtual address which can
> be looked up in the guest HPT.  This lookup can occasionally fail due
> to the guest replacing an SLB entry without invalidating the evicted
> SLB entry.  In this situation an ERAT (effective to real address
> translation cache) entry can persist and be used by the hardware even
> though there is no longer a corresponding SLB entry.
> 
> Previously we would just deliver a data or instruction storage interrupt
> (DSI or ISI) to the guest in this case.  However, this is not correct
> and has been observed to cause guests to crash, typically with a
> data storage protection interrupt on a store to the vmemmap area.
> 
> Instead, what we do now is to synthesize a data or instruction segment
> interrupt.  That should cause the guest to reload an appropriate entry
> into the SLB and retry the faulting instruction.  If it still faults,
> we should find an appropriate SLB entry next time and be able to handle
> the fault.
> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b1dab8d..3c6badc 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1749,7 +1749,8 @@ kvmppc_hdsi:
>   beq 3f
>   clrrdi  r0, r4, 28
>   PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
> - bne 1f  /* if no SLB entry found */
> + li  r0, BOOK3S_INTERRUPT_DATA_SEGMENT
> + bne 7f  /* if no SLB entry found */
>  4:   std r4, VCPU_FAULT_DAR(r9)
>   stw r6, VCPU_FAULT_DSISR(r9)
>  
> @@ -1768,14 +1769,15 @@ kvmppc_hdsi:
>   cmpdi   r3, -2  /* MMIO emulation; need instr word */
>   beq 2f
>  
> - /* Synthesize a DSI for the guest */
> + /* Synthesize a DSI (or DSegI) for the guest */
>   ld  r4, VCPU_FAULT_DAR(r9)
>   mr  r6, r3
> -1:   mtspr   SPRN_DAR, r4
> +1:   li  r0, BOOK3S_INTERRUPT_DATA_STORAGE
>   mtspr   SPRN_DSISR, r6
> +7:   mtspr   SPRN_DAR, r4

I first though "hey, you're not setting DSISR for the DsegI now" ... but
then I saw in the PowerISA that DSISR is "undefined" for the DSegI, so
this should be fine, I think.

>   mtspr   SPRN_SRR0, r10
>   mtspr   SPRN_SRR1, r11
> - li  r10, BOOK3S_INTERRUPT_DATA_STORAGE
> + mr  r10, r0
>   bl  kvmppc_msr_interrupt
>  fast_interrupt_c_return:
>  6:   ld  r7, VCPU_CTR(r9)
> @@ -1823,7 +1825,8 @@ kvmppc_hisi:
>   beq 3f
>   clrrdi  r0, r10, 28
>   PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
> - bne 1f  /* if no SLB entry found */
> + li  r0, BOOK3S_INTERRUPT_INST_SEGMENT
> + bne 7f  /* if no SLB entry found */
>  4:
>   /* Search the hash table. */
>   mr  r3, r9  /* vcpu pointer */
> @@ -1840,11 +1843,12 @@ kvmppc_hisi:
>   cmpdi   r3, -1  /* handle in kernel mode */
>   beq guest_exit_cont
>  
> - /* Synthesize an ISI for the guest */
> + /* Synthesize an ISI (or ISegI) for the guest */
>   mr  r11, r3
> -1:   mtspr   SPRN_SRR0, r10
> +1:   li  r0, BOOK3S_INTERRUPT_INST_STORAGE
> +7:   mtspr   SPRN_SRR0, r10
>   mtspr   SPRN_SRR1, r11
> - li  r10, BOOK3S_INTERRUPT_INST_STORAGE
> + mr  r10, r0
>   bl  kvmppc_msr_interrupt
>   b   fast_interrupt_c_return

As far as I can see (but I'm really not an expert here), the patch seems
to be fine. And we have a test running with it for almost 6 days now and
did not see any further guest crashes, so it seems to me like this is
the right fix for this issue! So if you like, you can add my
"Reviewed-by" or "Tested-by" to this patch.

 Thanks a lot for fixing this!
  Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 32/35] nvdimm acpi: support Set Namespace Label Data function

2015-11-02 Thread Xiao Guangrong
Function 6 is used to set Namespace Label Data

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Xiao Guangrong 
---
 hw/acpi/nvdimm.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index f30e2ff..2553be9 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -587,6 +587,34 @@ exit:
 nvdimm_dsm_write_status(out, status);
 }
 
+/*
+ * DSM Spec Rev1 4.6 Set Namespace Label Data (Function Index 6).
+ */
+static void nvdimm_dsm_func_set_label_data(NVDIMMDevice *nvdimm,
+   NvdimmDsmIn *in, GArray *out)
+{
+NVDIMMClass *nvc = NVDIMM_GET_CLASS(nvdimm);
+NvdimmFuncInSetLabelData *set_label_data = >func_set_label_data;
+uint32_t status;
+
+le32_to_cpus(_label_data->offset);
+le32_to_cpus(_label_data->length);
+
+nvdimm_debug("Write Label Data: offset %#x length %#x.\n",
+ set_label_data->offset, set_label_data->length);
+
+status = nvdimm_rw_label_data_check(nvdimm, set_label_data->offset,
+set_label_data->length);
+if (status != NVDIMM_DSM_STATUS_SUCCESS) {
+goto exit;
+}
+
+nvc->write_label_data(nvdimm, set_label_data->in_buf,
+  set_label_data->length, set_label_data->offset);
+exit:
+nvdimm_dsm_write_status(out, status);
+}
+
 static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 {
 GSList *list = nvdimm_get_plugged_device_list();
@@ -617,6 +645,9 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 case 0x5 /* Get Namespace Label Data */:
 nvdimm_dsm_func_get_label_data(nvdimm, in, out);
 goto free;
+case 0x6 /* Set Namespace Label Data */:
+nvdimm_dsm_func_set_label_data(nvdimm, in, out);
+goto free;
 default:
 status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
 };
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 31/35] nvdimm acpi: support Get Namespace Label Data function

2015-11-02 Thread Xiao Guangrong
Function 5 is used to get Namespace Label Data

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Xiao Guangrong 
---
 hw/acpi/nvdimm.c | 63 
 1 file changed, 63 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index cdc361c..f30e2ff 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -428,6 +428,7 @@ struct NvdimmDsmIn {
 union {
 uint8_t arg3[0];
 NvdimmFuncInSetLabelData func_set_label_data;
+NvdimmFuncInGetLabelData func_get_label_data;
 };
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
@@ -527,6 +528,65 @@ static void nvdimm_dsm_func_label_size(NVDIMMDevice 
*nvdimm, GArray *out)
 g_array_append_vals(out, _label_size, sizeof(func_label_size));
 }
 
+static uint32_t nvdimm_rw_label_data_check(NVDIMMDevice *nvdimm,
+   uint32_t offset, uint32_t length)
+{
+if (offset + length < offset) {
+nvdimm_debug("offset %#x + length %#x is overflow.\n", offset,
+ length);
+return NVDIMM_DSM_DEV_STATUS_INVALID_PARAS;
+}
+
+if (nvdimm->label_size < offset + length) {
+nvdimm_debug("position %#x is beyond label data (len = %#lx).\n",
+ offset + length, nvdimm->label_size);
+return NVDIMM_DSM_DEV_STATUS_INVALID_PARAS;
+}
+
+if (length > nvdimm_get_max_xfer_label_size()) {
+nvdimm_debug("length (%#x) is larger than max_xfer (%#x).\n",
+ length, nvdimm_get_max_xfer_label_size());
+return NVDIMM_DSM_DEV_STATUS_INVALID_PARAS;
+}
+
+return NVDIMM_DSM_STATUS_SUCCESS;
+}
+
+/*
+ * DSM Spec Rev1 4.5 Get Namespace Label Data (Function Index 5).
+ */
+static void nvdimm_dsm_func_get_label_data(NVDIMMDevice *nvdimm,
+   NvdimmDsmIn *in, GArray *out)
+{
+NVDIMMClass *nvc = NVDIMM_GET_CLASS(nvdimm);
+NvdimmFuncInGetLabelData *get_label_data = >func_get_label_data;
+void *buf;
+uint32_t status;
+
+le32_to_cpus(_label_data->offset);
+le32_to_cpus(_label_data->length);
+
+nvdimm_debug("Read Label Data: offset %#x length %#x.\n",
+ get_label_data->offset, get_label_data->length);
+
+status = nvdimm_rw_label_data_check(nvdimm, get_label_data->offset,
+get_label_data->length);
+if (status != NVDIMM_DSM_STATUS_SUCCESS) {
+goto exit;
+}
+
+/* write nvdimm_func_out_get_label_data.status. */
+nvdimm_dsm_write_status(out, status);
+/* write nvdimm_func_out_get_label_data.out_buf. */
+buf = acpi_data_push(out, get_label_data->length);
+nvc->read_label_data(nvdimm, buf, get_label_data->length,
+ get_label_data->offset);
+return;
+
+exit:
+nvdimm_dsm_write_status(out, status);
+}
+
 static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 {
 GSList *list = nvdimm_get_plugged_device_list();
@@ -554,6 +614,9 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 case 0x4 /* Get Namespace Label Size */:
 nvdimm_dsm_func_label_size(nvdimm, out);
 goto free;
+case 0x5 /* Get Namespace Label Data */:
+nvdimm_dsm_func_get_label_data(nvdimm, in, out);
+goto free;
 default:
 status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
 };
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 33/35] nvdimm: allow using whole backend memory as pmem

2015-11-02 Thread Xiao Guangrong
Introduce a parameter, named "reserve-label-data", if it is
false which indicates that QEMU does not reserve any region
on the backend memory to support label data. It is a
'label-less' NVDIMM device mode that linux will use whole
memory on the device as a single namesapce

This is useful for the users who want to pass whole nvdimm
device and make its data completely be visible to guest

The parameter is false on default

Signed-off-by: Xiao Guangrong 
---
 hw/acpi/nvdimm.c| 12 
 hw/mem/nvdimm.c | 43 ---
 include/hw/mem/nvdimm.h |  6 ++
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 2553be9..eab5d9c 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -531,6 +531,12 @@ static void nvdimm_dsm_func_label_size(NVDIMMDevice 
*nvdimm, GArray *out)
 static uint32_t nvdimm_rw_label_data_check(NVDIMMDevice *nvdimm,
uint32_t offset, uint32_t length)
 {
+if (!nvdimm->reserve_label_data) {
+nvdimm_debug("read/write label request on the device without "
+ "label data reserved.\n");
+return NVDIMM_DSM_STATUS_NOT_SUPPORTED;
+}
+
 if (offset + length < offset) {
 nvdimm_debug("offset %#x + length %#x is overflow.\n", offset,
  length);
@@ -637,6 +643,12 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
1 << 4 /* Get Namespace Label Size */ |
1 << 5 /* Get Namespace Label Data */ |
1 << 6 /* Set Namespace Label Data */);
+
+/* no function support if the device does not have label data. */
+if (!nvdimm->reserve_label_data) {
+cmd_list = cpu_to_le64(0);
+}
+
 build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
 goto free;
 case 0x4 /* Get Namespace Label Size */:
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index c310887..fde1c7f 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -39,14 +39,15 @@ static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
 {
 MemoryRegion *mr;
 NVDIMMDevice *nvdimm = NVDIMM(dimm);
-uint64_t size;
+uint64_t reserved_label_size, size;
 
 nvdimm->label_size = MIN_NAMESPACE_LABEL_SIZE;
+reserved_label_size = nvdimm->reserve_label_data ? nvdimm->label_size : 0;
 
 mr = host_memory_backend_get_memory(dimm->hostmem, errp);
 size = memory_region_size(mr);
 
-if (size <= nvdimm->label_size) {
+if (size <= reserved_label_size) {
 char *path = 
object_get_canonical_path_component(OBJECT(dimm->hostmem));
 error_setg(errp, "the size of memdev %s (0x%" PRIx64 ") is too small"
" to contain nvdimm namespace label (0x%" PRIx64 ")", path,
@@ -55,15 +56,19 @@ static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
 }
 
 memory_region_init_alias(>nvdimm_mr, OBJECT(dimm), "nvdimm-memory",
- mr, 0, size - nvdimm->label_size);
-nvdimm->label_data = memory_region_get_ram_ptr(mr) +
- memory_region_size(>nvdimm_mr);
+ mr, 0, size - reserved_label_size);
+
+if (reserved_label_size) {
+nvdimm->label_data = memory_region_get_ram_ptr(mr) +
+ memory_region_size(>nvdimm_mr);
+}
 }
 
 static void nvdimm_read_label_data(NVDIMMDevice *nvdimm, void *buf,
uint64_t size, uint64_t offset)
 {
-assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
+assert(nvdimm->reserve_label_data &&
+   (nvdimm->label_size >= size + offset) && (offset + size > offset));
 
 memcpy(buf, nvdimm->label_data + offset, size);
 }
@@ -75,7 +80,8 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
const void *buf,
 DIMMDevice *dimm = DIMM(nvdimm);
 uint64_t backend_offset;
 
-assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
+assert(nvdimm->reserve_label_data &&
+   (nvdimm->label_size >= size + offset) && (offset + size > offset));
 
 memcpy(nvdimm->label_data + offset, buf, size);
 
@@ -100,10 +106,33 @@ static void nvdimm_class_init(ObjectClass *oc, void *data)
 nvc->write_label_data = nvdimm_write_label_data;
 }
 
+static bool nvdimm_get_reserve_label_data(Object *obj, Error **errp)
+{
+NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+return nvdimm->reserve_label_data;
+}
+
+static void
+nvdimm_set_reserve_label_data(Object *obj, bool value, Error **errp)
+{
+NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+nvdimm->reserve_label_data = value;
+}
+
+static void nvdimm_init(Object *obj)
+{
+object_property_add_bool(obj, "reserve-label-data",
+ nvdimm_get_reserve_label_data,
+ 

[PATCH v3 1/3] target-i386: add a subsection for migrating vcpu's TSC rate

2015-11-02 Thread Haozhong Zhang
A new subsection 'vmstate_tsc_khz' is added to migrate vcpu's TSC
rate. For the backwards compatibility, this subsection is not migrated
on pc-*-2.4 and older machine types.

Signed-off-by: Haozhong Zhang 
---
 hw/i386/pc.c  |  1 +
 hw/i386/pc_piix.c |  1 +
 hw/i386/pc_q35.c  |  1 +
 include/hw/i386/pc.h  |  1 +
 target-i386/cpu.h |  1 +
 target-i386/machine.c | 21 +
 6 files changed, 26 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 0cb8afd..2f2fc93 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1952,6 +1952,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
 
 pcmc->get_hotplug_handler = mc->get_hotplug_handler;
+pcmc->save_tsc_khz = true;
 mc->get_hotplug_handler = pc_get_hotpug_handler;
 mc->cpu_index_to_socket_id = pc_cpu_index_to_socket_id;
 mc->default_boot_order = "cad";
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 393dcc4..fc71321 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -487,6 +487,7 @@ static void pc_i440fx_2_4_machine_options(MachineClass *m)
 m->alias = NULL;
 m->is_default = 0;
 pcmc->broken_reserved_end = true;
+pcmc->save_tsc_khz = false;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_4);
 }
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 2f8f396..858ed69 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -385,6 +385,7 @@ static void pc_q35_2_4_machine_options(MachineClass *m)
 pc_q35_2_5_machine_options(m);
 m->alias = NULL;
 pcmc->broken_reserved_end = true;
+pcmc->save_tsc_khz = false;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_4);
 }
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 606dbc2..875d099 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -60,6 +60,7 @@ struct PCMachineClass {
 
 /*< public >*/
 bool broken_reserved_end;
+bool save_tsc_khz;
 HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
DeviceState *dev);
 };
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 62f7879..4f2f4a3 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -970,6 +970,7 @@ typedef struct CPUX86State {
 uint32_t sipi_vector;
 bool tsc_valid;
 int64_t tsc_khz;
+int64_t tsc_khz_saved;
 void *kvm_xsave_buf;
 
 uint64_t mcg_cap;
diff --git a/target-i386/machine.c b/target-i386/machine.c
index a18e16e..4d8157c 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -775,6 +775,26 @@ static const VMStateDescription vmstate_xss = {
 }
 };
 
+static bool tsc_khz_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = >env;
+MachineClass *mc = MACHINE_GET_CLASS((qdev_get_machine()));
+PCMachineClass *pcmc = PC_MACHINE_CLASS(mc);
+return env->tsc_khz_saved && pcmc->save_tsc_khz;
+}
+
+static const VMStateDescription vmstate_tsc_khz = {
+.name = "cpu/tsc_khz",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = tsc_khz_needed,
+.fields = (VMStateField[]) {
+VMSTATE_INT64(env.tsc_khz_saved, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
 VMStateDescription vmstate_x86_cpu = {
 .name = "cpu",
 .version_id = 12,
@@ -895,6 +915,7 @@ VMStateDescription vmstate_x86_cpu = {
 _msr_hyperv_runtime,
 _avx512,
 _xss,
+_tsc_khz,
 NULL
 }
 };
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] target-i386: load the migrated vcpu's TSC rate

2015-11-02 Thread Haozhong Zhang
Set vcpu's TSC rate to the migrated value if the user does not specify a
TSC rate by cpu option 'tsc-freq' and a migrated TSC rate does exist. If
KVM supports TSC scaling, guest programs will observe TSC increasing in
the migrated rate other than the host TSC rate.

Signed-off-by: Haozhong Zhang 
---
 target-i386/kvm.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index aae5e58..2be70df 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3042,6 +3042,27 @@ int kvm_arch_setup_tsc_khz(CPUState *cs)
 int r;
 
 /*
+ * If a TSC rate is migrated and the user does not specify the
+ * vcpu's TSC rate on the destination, the migrated TSC rate will
+ * be used on the destination after the migration.
+ */
+if (env->tsc_khz_saved && !env->tsc_khz) {
+if (kvm_check_extension(cs->kvm_state, KVM_CAP_TSC_CONTROL)) {
+r = kvm_vcpu_ioctl(cs, KVM_SET_TSC_KHZ, env->tsc_khz_saved);
+if (r < 0) {
+fprintf(stderr, "KVM_SET_TSC_KHZ failed\n");
+}
+} else {
+r = -1;
+fprintf(stderr, "KVM doesn't support TSC scaling\n");
+}
+if (r < 0) {
+fprintf(stderr, "Use host TSC frequency instead. "
+"Guest TSC may be inaccurate.\n");
+}
+}
+
+/*
  * Prepare vcpu's TSC rate to be migrated.
  *
  * - If the user specifies the TSC rate by cpu option 'tsc-freq',
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-11-02 Thread James Hogan
On Mon, Nov 02, 2015 at 05:26:42PM +0800, Haozhong Zhang wrote:
> The value of the migrated vcpu's TSC rate is determined as below.
>  1. If a TSC rate is specified by the cpu option 'tsc-freq', then this
> user-specified value will be used.
>  2. If neither a user-specified TSC rate nor a migrated TSC rate is
> present, we will use the TSC rate from KVM (returned by
> KVM_GET_TSC_KHZ).
>  3. Otherwise, we will use the migrated TSC rate.
> 
> Signed-off-by: Haozhong Zhang 
> ---
>  include/sysemu/kvm.h |  2 ++
>  kvm-all.c|  1 +
>  target-arm/kvm.c |  5 +
>  target-i386/kvm.c| 33 +
>  target-mips/kvm.c|  5 +
>  target-ppc/kvm.c |  5 +
>  target-s390x/kvm.c   |  5 +
>  7 files changed, 56 insertions(+)
> 
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 461ef65..0ec8b98 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -328,6 +328,8 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry 
> *route,
>  
>  int kvm_arch_msi_data_to_gsi(uint32_t data);
>  
> +int kvm_arch_setup_tsc_khz(CPUState *cpu);
> +
>  int kvm_set_irq(KVMState *s, int irq, int level);
>  int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
>  
> diff --git a/kvm-all.c b/kvm-all.c
> index c442838..1ecaf04 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1757,6 +1757,7 @@ static void do_kvm_cpu_synchronize_post_init(void *arg)
>  {
>  CPUState *cpu = arg;
>  
> +kvm_arch_setup_tsc_khz(cpu);

Sorry if this is a stupid question, but why aren't you doing this from
the i386 kvm_arch_put_registers when level == KVM_PUT_FULL_STATE, rather
than introducing x86 specifics to the generic KVM api?

Cheers
James

>  kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
>  cpu->kvm_vcpu_dirty = false;
>  }
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index 79ef4c6..a724f6d 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -614,3 +614,8 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>  {
>  return (data - 32) & 0x;
>  }
> +
> +int kvm_arch_setup_tsc_khz(CPUState *cs)
> +{
> +return 0;
> +}
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 64046cb..aae5e58 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -3034,3 +3034,36 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>  {
>  abort();
>  }
> +
> +int kvm_arch_setup_tsc_khz(CPUState *cs)
> +{
> +X86CPU *cpu = X86_CPU(cs);
> +CPUX86State *env = >env;
> +int r;
> +
> +/*
> + * Prepare vcpu's TSC rate to be migrated.
> + *
> + * - If the user specifies the TSC rate by cpu option 'tsc-freq',
> + *   we will use the user-specified value.
> + *
> + * - If there is neither user-specified TSC rate nor migrated TSC
> + *   rate, we will ask KVM for the TSC rate by calling
> + *   KVM_GET_TSC_KHZ.
> + *
> + * - Otherwise, if there is a migrated TSC rate, we will use the
> + *   migrated value.
> + */
> +if (env->tsc_khz) {
> +env->tsc_khz_saved = env->tsc_khz;
> +} else if (!env->tsc_khz_saved) {
> +r = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
> +if (r < 0) {
> +fprintf(stderr, "KVM_GET_TSC_KHZ failed\n");
> +return r;
> +}
> +env->tsc_khz_saved = r;
> +}
> +
> +return 0;
> +}
> diff --git a/target-mips/kvm.c b/target-mips/kvm.c
> index 12d7db3..fb26d7e 100644
> --- a/target-mips/kvm.c
> +++ b/target-mips/kvm.c
> @@ -687,3 +687,8 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>  {
>  abort();
>  }
> +
> +int kvm_arch_setup_tsc_khz(CPUState *cs)
> +{
> +return 0;
> +}
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index ac70f08..c429f0c 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -2510,3 +2510,8 @@ int kvmppc_enable_hwrng(void)
>  
>  return kvmppc_enable_hcall(kvm_state, H_RANDOM);
>  }
> +
> +int kvm_arch_setup_tsc_khz(CPUState *cs)
> +{
> +return 0;
> +}
> diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
> index c3be180..db5d436 100644
> --- a/target-s390x/kvm.c
> +++ b/target-s390x/kvm.c
> @@ -2248,3 +2248,8 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>  {
>  abort();
>  }
> +
> +int kvm_arch_setup_tsc_khz(CPUState *cs)
> +{
> +return 0;
> +}
> -- 
> 2.4.8
> 


signature.asc
Description: Digital signature


[PATCH v7 00/35] implement vNVDIMM

2015-11-02 Thread Xiao Guangrong
This patchset can be found at:
  https://github.com/xiaogr/qemu.git nvdimm-v7

It is based on pci branch on Michael's tree and the top commit is:
commit 6f96a31a06c2a1 (tests: re-enable vhost-user-test).

Changelog in v7:
- changes from Vladimir Sementsov-Ogievskiy's comments:
  1) let gethugepagesize() realize if fstat is failed instead of get
 normal page size
  2) rename  open_file_path to open_ram_file_path
  3) better log the error message by using error_setg_errno
  4) update commit in the commit log to explain hugepage detection on
 Windows

- changes from Eduardo Habkost's comments:
  1) use 'Error**' to collect error message for qemu_file_get_page_size()
  2) move gethugepagesize() replacement to the same patch to make it
 better for review
  3) introduce qemu_get_file_size to unity the code with raw_getlength()

- changes from Stefan's comments:
  1) check the memory region is large enough to contain DSM output
 buffer

- changes from Eric Blake's comments:
  1) update the shell command in the commit log to generate the patch
 which drops 'pc-dimm' prefix
  
- others:
  pick up Reviewed-by from Stefan, Vladimir Sementsov-Ogievskiy, and
  Eric Blake.

Changelog in v6:
- changes from Stefan's comments:
  1) fix code style of struct naming by CamelCase way
  2) fix offset + length overflow when read/write label data
  3) compile hw/acpi/nvdimm.c for per target so that TARGET_PAGE_SIZE can
 be used to replace getpagesize()

Changelog in v5:
- changes from Michael's comments:
  1) prefix nvdimm_ to everything in NVDIMM source files
  2) make parsing _DSM Arg3 more clear
  3) comment style fix
  5) drop single used definition
  6) fix dirty dsm buffer lost due to memory write happened on host
  7) check dsm buffer if it is big enough to contain input data
  8) use build_append_int_noprefix to store single value to GArray

- changes from Michael's and Igor's comments:
  1) introduce 'nvdimm-support' parameter to control nvdimm
 enablement and it is disabled for 2.4 and its earlier versions
 to make live migration compatible
  2) only reserve 1 RAM page and 4 bytes IO Port for NVDIMM ACPI
 virtualization

- changes from Stefan's comments:
  1) do endian adjustment for the buffer length

- changes from Bharata B Rao's comments:
  1) fix compile on ppc

- others:
  1) the buffer length is directly got from IO read rather than got
 from dsm memory
  2) fix dirty label data lost due to memory write happened on host

Changelog in v4:
- changes from Michael's comments:
  1) show the message, "Memory is not allocated from HugeTlbfs", if file
 based memory is not allocated from hugetlbfs.
  2) introduce function, acpi_get_nvdimm_state(), to get NVDIMMState
 from Machine.
  3) statically define UUID and make its operation more clear
  4) use GArray to build device structures to avoid potential buffer
 overflow
  4) improve comments in the code
  5) improve code style

- changes from Igor's comments:
  1) add NVDIMM ACPI spec document
  2) use serialized method to avoid Mutex
  3) move NVDIMM ACPI's code to hw/acpi/nvdimm.c
  4) introduce a common ASL method used by _DSM for all devices to reduce
 ACPI size
  5) handle UUID in ACPI AML code. BTW, i'd keep handling revision in QEMU
 it's better to upgrade QEMU to support Rev2 in the future

- changes from Stefan's comments:
  1) copy input data from DSM memory to local buffer to avoid potential
 issues as DSM memory is visible to guest. Output data is handled
 in a similar way

- changes from Dan's comments:
  1) drop static namespace as Linux has already supported label-less
 nvdimm devices

- changes from Vladimir's comments:
  1) print better message, "failed to get file size for %s, can't create
 backend on it", if any file operation filed to obtain file size

- others:
  create a git repo on github.com for better review/test

Also, thanks for Eric Blake's review on QAPI's side.

Thank all of you to review this patchset.

Changelog in v3:
There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
Michael for their valuable comments, the patchset finally gets better shape.
- changes from Igor's comments:
  1) abstract dimm device type from pc-dimm and create nvdimm device based on
 dimm, then it uses memory backend device as nvdimm's memory and NUMA has
 easily been implemented.
  2) let file-backend device support any kind of filesystem not only for
 hugetlbfs and let it work on file not only for directory which is
 achieved by extending 'mem-path' - if it's a directory then it works as
 current behavior, otherwise if it's file then directly allocates memory
 from it.
  3) we figure out a unused memory hole below 4G that is 0xFF0 ~ 
 0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit
 ACPI SSDT/DSDT table will break windows XP.
 BTW, only make SSDT.rev = 2 can not work since the width is only depended
 

Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ

2015-11-02 Thread Paolo Bonzini


On 30/10/2015 07:16, Yunhong Jiang wrote:
> And with this change, we even don't need the module option anymore, we first 
> try the primary handler, which is in hard irq context, and if failed, then
> threaded irq handler. Am I right?

Yes.

> Paolo/Alex, do you want to work on the patch yourself? If not, I will be 
> happy to try this method.

Of course you can do it yourself.

Thanks!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v6 18/33] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region

2015-11-02 Thread Xiao Guangrong



On 10/31/2015 10:15 PM, Xiao Guangrong wrote:



On 10/31/2015 06:52 PM, Vladimir Sementsov-Ogievskiy wrote:

On 30.10.2015 08:56, Xiao Guangrong wrote:

Curretly, the memory region of backed memory is directly mapped to
guest's address space, however, it is not true for nvdimm device

This patch let dimm device realize this fact and use
DIMMDeviceClass->get_memory_region method to get the mapped memory
region

Signed-off-by: Xiao Guangrong 
---
  hw/mem/dimm.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 4a63409..498d380 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor *v, void 
*opaque,
  int64_t value;
  MemoryRegion *mr;
  DIMMDevice *dimm = DIMM(obj);
+DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
-mr = host_memory_backend_get_memory(dimm->hostmem, errp);
+mr = ddc->get_memory_region(dimm);
  value = memory_region_size(mr);
  visit_type_int(v, , name, errp);


Isn't it better to add Error** parameter to get_memory_region and not mix 
returning error in errp
parameter and aborting through _abort?



Yes, it's better, will do. Thanks for your suggestion.


Vladimir,

After checking the current code, the @errp in pc-dimm's get_memory_region() is 
useless
and all its caller did not check the return value as they assumed the backend 
memory of
pc-dimm is always properly initialized.

So we catch the error condition internally instead of propagating it to the 
caller:

 static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
 {
-return host_memory_backend_get_memory(dimm->hostmem, _abort);
+Error *local_err = NULL;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(dimm->hostmem, _err);
+
+/*
+ * plug a pc-dimm device whose backend memory was not properly
+ * initialized?
+ */
+assert(!local_err && mr);
+return mr;
 }

please review it in the new version i have CCed you.

Thanks for your review.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3] target-i386: save/restore vcpu's TSC rate during migration

2015-11-02 Thread Haozhong Zhang
This patchset enables QEMU to save/restore vcpu's TSC rate during the
migration on machine types pc-*-2.5 or newer.

On the source machine:
 * If the vcpu's TSC rate is specified by the cpu option 'tsc-freq',
   then this user-specified TSC rate will be migrated.
 * Otherwise, the TSC rate returned by KVM_GET_TSC_KHZ will be
   migrated. For a fresh VM, this is the host TSC rate.

On the destination machine:
 * If the vcpu's TSC rate is specified by the cpu option 'tsc-freq',
   then QEMU will try to use this user-specified TSC rate rather than
   the migrated value.
 * Otherwise, QEMU will try to use the migrated TSC rate. If KVM on
   the destination supports TSC scaling, guest programs will observe a
   consistent TSC rate across the migration. If TSC scaling is not
   supported, the migration will not be aborted and QEMU will behave
   like before, i.e using the host TSC rate instead.

Changes in v3:
 * Change the cpu option 'save-tsc-freq' to an internal flag.
 * Remove the cpu option 'load-tsc-freq' and change the logic of
   loading the migrated TSC rate as above.
 * Move the setup of migrated TSC rate back to
   do_kvm_cpu_synchronize_post_init().

Changes in v2:
 * Add a pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' to
   control the migration of vcpu's TSC rate.
 * Move all logic of setting TSC rate to target-i386.
 * Remove the duplicated TSC setup in kvm_arch_init_vcpu().

Haozhong Zhang (3):
  target-i386: add a subsection for migrating vcpu's TSC rate
  target-i386: calculate vcpu's TSC rate to be migrated
  target-i386: load the migrated vcpu's TSC rate

 hw/i386/pc.c  |  1 +
 hw/i386/pc_piix.c |  1 +
 hw/i386/pc_q35.c  |  1 +
 include/hw/i386/pc.h  |  1 +
 include/sysemu/kvm.h  |  2 ++
 kvm-all.c |  1 +
 target-arm/kvm.c  |  5 +
 target-i386/cpu.h |  1 +
 target-i386/kvm.c | 54 +++
 target-i386/machine.c | 21 
 target-mips/kvm.c |  5 +
 target-ppc/kvm.c  |  5 +
 target-s390x/kvm.c|  5 +
 13 files changed, 103 insertions(+)

-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >