[Qemu-devel] [PATCH v2 07/13] vfio: VFIO core

2012-05-21 Thread Alex Williamson
VFIO is a secure user level driver for use with both virtual machines
and user level drivers.  VFIO makes use of IOMMU groups to ensure the
isolation of devices in use, allowing unprivileged user access.  It's
intended that VFIO will replace KVM device assignment and UIO drivers
(in cases where the target platform includes a sufficiently capable
IOMMU).

New in this version of VFIO is support for IOMMU groups managed
through the IOMMU core as well as a rework of the API, removing the
group merge interface.  We now go back to a model more similar to
original VFIO with UIOMMU support where the file descriptor obtained
from /dev/vfio/vfio allows access to the IOMMU, but only after a
group is added, avoiding the previous privilege issues with this type
of model.  IOMMU support is also now fully modular as IOMMUs have
vastly different interface requirements on different platforms.  VFIO
users are able to query and initialize the IOMMU model of their
choice.

Please see the follow-on Documentation commit for further description
and usage example.

Signed-off-by: Alex Williamson 
---

 Documentation/ioctl/ioctl-number.txt |1 
 MAINTAINERS  |8 
 drivers/Kconfig  |2 
 drivers/Makefile |1 
 drivers/vfio/Kconfig |8 
 drivers/vfio/Makefile|1 
 drivers/vfio/vfio.c  | 1406 ++
 include/linux/vfio.h |  366 +
 8 files changed, 1793 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vfio/Kconfig
 create mode 100644 drivers/vfio/Makefile
 create mode 100644 drivers/vfio/vfio.c
 create mode 100644 include/linux/vfio.h

diff --git a/Documentation/ioctl/ioctl-number.txt 
b/Documentation/ioctl/ioctl-number.txt
index e34b531..111e30a 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -88,6 +88,7 @@ Code  Seq#(hex)   Include FileComments
and kernel/power/user.c
 '8'all SNP8023 advanced NIC card

+';'64-6F   linux/vfio.h
 '@'00-0F   linux/radeonfb.hconflict!
 '@'00-0F   drivers/video/aty/aty128fb.cconflict!
 'A'00-1F   linux/apm_bios.hconflict!
diff --git a/MAINTAINERS b/MAINTAINERS
index b362709..5aca4ff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7229,6 +7229,14 @@ S:   Maintained
 F: Documentation/filesystems/vfat.txt
 F: fs/fat/
 
+VFIO DRIVER
+M: Alex Williamson 
+L: k...@vger.kernel.org
+S: Maintained
+F: Documentation/vfio.txt
+F: drivers/vfio/
+F: include/linux/vfio.h
+
 VIDEOBUF2 FRAMEWORK
 M: Pawel Osciak 
 M: Marek Szyprowski 
diff --git a/drivers/Kconfig b/drivers/Kconfig
index d236aef..46eb115 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -112,6 +112,8 @@ source "drivers/auxdisplay/Kconfig"
 
 source "drivers/uio/Kconfig"
 
+source "drivers/vfio/Kconfig"
+
 source "drivers/vlynq/Kconfig"
 
 source "drivers/virtio/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index 95952c8..fe1880a 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_ATM) += atm/
 obj-$(CONFIG_FUSION)   += message/
 obj-y  += firewire/
 obj-$(CONFIG_UIO)  += uio/
+obj-$(CONFIG_VFIO) += vfio/
 obj-y  += cdrom/
 obj-y  += auxdisplay/
 obj-$(CONFIG_PCCARD)   += pcmcia/
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
new file mode 100644
index 000..9acb1e7
--- /dev/null
+++ b/drivers/vfio/Kconfig
@@ -0,0 +1,8 @@
+menuconfig VFIO
+   tristate "VFIO Non-Privileged userspace driver framework"
+   depends on IOMMU_API
+   help
+ VFIO provides a framework for secure userspace device drivers.
+ See Documentation/vfio.txt for more details.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
new file mode 100644
index 000..7500a67
--- /dev/null
+++ b/drivers/vfio/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_VFIO) += vfio.o
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
new file mode 100644
index 000..6558eef
--- /dev/null
+++ b/drivers/vfio/vfio.c
@@ -0,0 +1,1406 @@
+/*
+ * VFIO core
+ *
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, p...@cisco.com
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-21 Thread Alexander Graf


On 22.05.2012, at 08:11, Alexey Kardashevskiy  wrote:

> On 22/05/12 15:52, Alexander Graf wrote:
>> 
>> 
>> On 22.05.2012, at 05:44, Alexey Kardashevskiy  wrote:
>> 
>>> On 22/05/12 13:21, Alexander Graf wrote:
 
 
 On 22.05.2012, at 04:02, Benjamin Herrenschmidt  
 wrote:
 
> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>> Alexander,
>> 
>> Is that any better? :)
> 
> Alex (Graf that is), ping ?
> 
> The original patch from Alexey was fine btw.
> 
> VFIO will always call things with the existing capability offset so
> there's no real risk of doing the wrong thing or break the list or
> anything.
> 
> IE. A small simple patch that addresses the problem :-)
> 
> The new patch is a bit more "robust" I believe, I don't think we need to
> go too far to fix a problem we don't have. But we need a fix for the
> real issue and the simple patch does it neatly from what I can
> understand.
> 
> Cheers,
> Ben.
> 
>> 
>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>> * in pci config space */
>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>  uint8_t offset, uint8_t size)
>> {
>> -uint8_t *config;
>> +uint8_t *config, existing;
 
 Existing is a pointer to the target dev's config space, right?
>>> 
>>> Yes.
>>> 
>>   int i, overlapping_cap;
>> 
>> +existing = pci_find_capability(pdev, cap_id);
>> +if (existing) {
>> +if (offset && (existing != offset)) {
>> +return -EEXIST;
>> +}
>> +for (i = existing; i < size; ++i) {
 
 So how does this possibly make sense?
>>> 
>>> Although I do not expect VFIO to add capabilities (does not make sense), I 
>>> still want to double
>>> check that this space has not been tried to use by someone else.
>> 
>> i is an int. existing is a uint8_t*.
> 
> 
> It was there before me. This function already does a loop and this is how it 
> was coded at the first place.

Also, while at it, please add some comments at least for the code you add that 
explain why you do the things you do :).

Alex




[Qemu-devel] [PATCH v2 10/13] pci: export pci_user functions for use by other drivers

2012-05-21 Thread Alex Williamson
VFIO PCI support will make use of these for user initiated
PCI config accesses.

Signed-off-by: Alex Williamson 
Acked-by: Bjorn Helgaas 
---

 drivers/pci/access.c |6 --
 drivers/pci/pci.h|7 ---
 include/linux/pci.h  |8 
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 2a58164..ba91a7e 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -162,7 +162,8 @@ int pci_user_read_config_##size 
\
if (ret > 0)\
ret = -EINVAL;  \
return ret; \
-}
+}  \
+EXPORT_SYMBOL_GPL(pci_user_read_config_##size);
 
 /* Returns 0 on success, negative values indicate error. */
 #define PCI_USER_WRITE_CONFIG(size,type)   \
@@ -181,7 +182,8 @@ int pci_user_write_config_##size
\
if (ret > 0)\
ret = -EINVAL;  \
return ret; \
-}
+}  \
+EXPORT_SYMBOL_GPL(pci_user_write_config_##size);
 
 PCI_USER_READ_CONFIG(byte, u8)
 PCI_USER_READ_CONFIG(word, u16)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e494347..f2dcc46 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -86,13 +86,6 @@ static inline bool pci_is_bridge(struct pci_dev *pci_dev)
return !!(pci_dev->subordinate);
 }
 
-extern int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val);
-extern int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val);
-extern int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 
*val);
-extern int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val);
-extern int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val);
-extern int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 
val);
-
 struct pci_vpd_ops {
ssize_t (*read)(struct pci_dev *dev, loff_t pos, size_t count, void 
*buf);
ssize_t (*write)(struct pci_dev *dev, loff_t pos, size_t count, const 
void *buf);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2559735..0cf57d5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -770,6 +770,14 @@ static inline int pci_write_config_dword(const struct 
pci_dev *dev, int where,
return pci_bus_write_config_dword(dev->bus, dev->devfn, where, val);
 }
 
+/* user-space driven config access */
+int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val);
+int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val);
+int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 *val);
+int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val);
+int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val);
+int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 val);
+
 int __must_check pci_enable_device(struct pci_dev *dev);
 int __must_check pci_enable_device_io(struct pci_dev *dev);
 int __must_check pci_enable_device_mem(struct pci_dev *dev);




Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-21 Thread Alexander Graf


On 22.05.2012, at 08:11, Alexey Kardashevskiy  wrote:

> On 22/05/12 15:52, Alexander Graf wrote:
>> 
>> 
>> On 22.05.2012, at 05:44, Alexey Kardashevskiy  wrote:
>> 
>>> On 22/05/12 13:21, Alexander Graf wrote:
 
 
 On 22.05.2012, at 04:02, Benjamin Herrenschmidt  
 wrote:
 
> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>> Alexander,
>> 
>> Is that any better? :)
> 
> Alex (Graf that is), ping ?
> 
> The original patch from Alexey was fine btw.
> 
> VFIO will always call things with the existing capability offset so
> there's no real risk of doing the wrong thing or break the list or
> anything.
> 
> IE. A small simple patch that addresses the problem :-)
> 
> The new patch is a bit more "robust" I believe, I don't think we need to
> go too far to fix a problem we don't have. But we need a fix for the
> real issue and the simple patch does it neatly from what I can
> understand.
> 
> Cheers,
> Ben.
> 
>> 
>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>> * in pci config space */
>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>  uint8_t offset, uint8_t size)
>> {
>> -uint8_t *config;
>> +uint8_t *config, existing;
 
 Existing is a pointer to the target dev's config space, right?
>>> 
>>> Yes.
>>> 
>>   int i, overlapping_cap;
>> 
>> +existing = pci_find_capability(pdev, cap_id);
>> +if (existing) {
>> +if (offset && (existing != offset)) {
>> +return -EEXIST;
>> +}
>> +for (i = existing; i < size; ++i) {
 
 So how does this possibly make sense?
>>> 
>>> Although I do not expect VFIO to add capabilities (does not make sense), I 
>>> still want to double
>>> check that this space has not been tried to use by someone else.
>> 
>> i is an int. existing is a uint8_t*.
> 
> 
> It was there before me. This function already does a loop and this is how it 
> was coded at the first place.

Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax...

> 
> 
>> +if (pdev->used[i]) {
>> +return -EFAULT;
>> +}
>> +}
>> +memset(pdev->used + offset, 0xFF, size);
 Why?
>>> 
>>> Because I am marking the space this capability takes as used.
>> 
>> But if it already existed (at the same offset), it should be set used 
>> already, no? Unless size > existing size, in which case you might overwrite 
>> data in the next chunk, no?
> 
> 
> No, it does not exist for VFIO - VFIO read the config space from the host 
> kernel first and then calls msi_init or msix_init or whatever_else_init 
> depending on what it got from the host kernel. And these xxx_init() functions 
> eventually call pci_add_capability().

So why would the function that populates the config space initially not set the 
used flag correctly?

> 
> Sure we can either implement own msi_init/msix_init (and may be others in the 
> future) for VFIO (which would do all the same as other QEMU devices except 
> touching the capabilities)  OR  hack msi_init/msix_init not to touch 
> capabilities if they exist.

No, calling the internal functions sounds fine to me. It's the step before that 
irritates me. VFIO shouldn't differ too much from an emulated device wrt its 
config space population really.

> 
> 
> 
>> +/* Make capability read-only by default */
>> +memset(pdev->wmask + offset, 0, size);
 Why?
>>> 
>>> Because the pci_add_capability() does it for a new capability by default.
>> 
>> Hrm. So you're copying code? Can't you merge the overwrite and write cases?
> 
> I am trying to make it as a single chunk which is as small as possible.

No, you're needlessly duplicating code which is a bad idea :). Please reuse as 
much of the existing function as possible, unless it really doesn't make sense.

> 
> 
> If it helps, below is the same patch with extended context to see what is 
> going on in that function.
> 
> 
> 
> 
> 
> 
> hw/pci.c |   20 +++-
> 1 files changed, 19 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 63a8219..7008a42 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool 
> is_default_rom)
> ptr = memory_region_get_ram_ptr(&pdev->rom);
> load_image(path, ptr);
> g_free(path);
> 
> if (is_default_rom) {
> /* Only the default rom images will be patched (if needed). */
> pci_patch_ids(pdev, ptr, size);
> }
> 
> qemu_put_ram_ptr(ptr);
> 
> pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
> 
> return 0;
> }
> 
> static void pci_del_option_rom(PCIDevice *pdev)
> {
> if (!pdev->has_rom)
> return;
> 
> vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
> memor

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-21 Thread Alexey Kardashevskiy
On 22/05/12 15:52, Alexander Graf wrote:
> 
> 
> On 22.05.2012, at 05:44, Alexey Kardashevskiy  wrote:
> 
>> On 22/05/12 13:21, Alexander Graf wrote:
>>>
>>>
>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt  
>>> wrote:
>>>
 On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
> Alexander,
>
> Is that any better? :)

 Alex (Graf that is), ping ?

 The original patch from Alexey was fine btw.

 VFIO will always call things with the existing capability offset so
 there's no real risk of doing the wrong thing or break the list or
 anything.

 IE. A small simple patch that addresses the problem :-)

 The new patch is a bit more "robust" I believe, I don't think we need to
 go too far to fix a problem we don't have. But we need a fix for the
 real issue and the simple patch does it neatly from what I can
 understand.

 Cheers,
 Ben.

>
> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
> * in pci config space */
> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>   uint8_t offset, uint8_t size)
> {
> -uint8_t *config;
> +uint8_t *config, existing;
>>>
>>> Existing is a pointer to the target dev's config space, right?
>>
>> Yes.
>>
>int i, overlapping_cap;
>
> +existing = pci_find_capability(pdev, cap_id);
> +if (existing) {
> +if (offset && (existing != offset)) {
> +return -EEXIST;
> +}
> +for (i = existing; i < size; ++i) {
>>>
>>> So how does this possibly make sense?
>>
>> Although I do not expect VFIO to add capabilities (does not make sense), I 
>> still want to double
>> check that this space has not been tried to use by someone else.
> 
> i is an int. existing is a uint8_t*.


It was there before me. This function already does a loop and this is how it 
was coded at the first place.


> +if (pdev->used[i]) {
> +return -EFAULT;
> +}
> +}
> +memset(pdev->used + offset, 0xFF, size);
>>> Why?
>>
>> Because I am marking the space this capability takes as used.
> 
> But if it already existed (at the same offset), it should be set used 
> already, no? Unless size > existing size, in which case you might overwrite 
> data in the next chunk, no?


No, it does not exist for VFIO - VFIO read the config space from the host 
kernel first and then calls msi_init or msix_init or whatever_else_init 
depending on what it got from the host kernel. And these xxx_init() functions 
eventually call pci_add_capability().

Sure we can either implement own msi_init/msix_init (and may be others in the 
future) for VFIO (which would do all the same as other QEMU devices except 
touching the capabilities)  OR  hack msi_init/msix_init not to touch 
capabilities if they exist.



> +/* Make capability read-only by default */
> +memset(pdev->wmask + offset, 0, size);
>>> Why?
>>
>> Because the pci_add_capability() does it for a new capability by default.
> 
> Hrm. So you're copying code? Can't you merge the overwrite and write cases?

I am trying to make it as a single chunk which is as small as possible.


If it helps, below is the same patch with extended context to see what is going 
on in that function.






hw/pci.c |   20 +++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 63a8219..7008a42 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool 
is_default_rom)
 ptr = memory_region_get_ram_ptr(&pdev->rom);
 load_image(path, ptr);
 g_free(path);
 
 if (is_default_rom) {
 /* Only the default rom images will be patched (if needed). */
 pci_patch_ids(pdev, ptr, size);
 }
 
 qemu_put_ram_ptr(ptr);
 
 pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
 
 return 0;
 }
 
 static void pci_del_option_rom(PCIDevice *pdev)
 {
 if (!pdev->has_rom)
 return;
 
 vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
 memory_region_destroy(&pdev->rom);
 pdev->has_rom = false;
 }
 
 /*
  * if !offset
  * Reserve space and add capability to the linked list in pci config space
  *
  * if offset = 0,
  * Find and reserve space and add capability to the linked list
  * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
uint8_t offset, uint8_t size)
 {
-uint8_t *config;
+uint8_t *config, existing;
 int i, overlapping_cap;
 
+existing = pci_find_capability(pdev, cap_id);
+if (existing) {
+if (offset && (existing != offset)) {
+return -EEXIST;
+}
+for (i = existing; i < size; ++i) {
+if (pdev->used[i]) {
+return -EFAULT;
+}
+}
+memset(pdev

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-21 Thread Alexander Graf


On 22.05.2012, at 05:44, Alexey Kardashevskiy  wrote:

> On 22/05/12 13:21, Alexander Graf wrote:
>> 
>> 
>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt  
>> wrote:
>> 
>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
 Alexander,
 
 Is that any better? :)
>>> 
>>> Alex (Graf that is), ping ?
>>> 
>>> The original patch from Alexey was fine btw.
>>> 
>>> VFIO will always call things with the existing capability offset so
>>> there's no real risk of doing the wrong thing or break the list or
>>> anything.
>>> 
>>> IE. A small simple patch that addresses the problem :-)
>>> 
>>> The new patch is a bit more "robust" I believe, I don't think we need to
>>> go too far to fix a problem we don't have. But we need a fix for the
>>> real issue and the simple patch does it neatly from what I can
>>> understand.
>>> 
>>> Cheers,
>>> Ben.
>>> 
 
 @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
 * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
   uint8_t offset, uint8_t size)
 {
 -uint8_t *config;
 +uint8_t *config, existing;
>> 
>> Existing is a pointer to the target dev's config space, right?
> 
> Yes.
> 
int i, overlapping_cap;
 
 +existing = pci_find_capability(pdev, cap_id);
 +if (existing) {
 +if (offset && (existing != offset)) {
 +return -EEXIST;
 +}
 +for (i = existing; i < size; ++i) {
>> 
>> So how does this possibly make sense?
> 
> Although I do not expect VFIO to add capabilities (does not make sense), I 
> still want to double
> check that this space has not been tried to use by someone else.

i is an int. existing is a uint8_t*.

> 
 +if (pdev->used[i]) {
 +return -EFAULT;
 +}
 +}
 +memset(pdev->used + offset, 0xFF, size);
>> Why?
> 
> Because I am marking the space this capability takes as used.

But if it already existed (at the same offset), it should be set used already, 
no? Unless size > existing size, in which case you might overwrite data in the 
next chunk, no?

> 
 +/* Make capability read-only by default */
 +memset(pdev->wmask + offset, 0, size);
>> Why?
> 
> Because the pci_add_capability() does it for a new capability by default.

Hrm. So you're copying code? Can't you merge the overwrite and write cases?

Alex

> 
> 
 +/* Check capability by default */
 +memset(pdev->cmask + offset, 0xFF, size);
>> 
>> I don't understand this part either.
> 
> The pci_add_capability() does it for a new capability by default.
> 
> 
> 
>> 
>> Alex
>> 
 +return existing;
 +}
 +
if (!offset) {
offset = pci_find_space(pdev, size);
if (!offset) {
return -ENOSPC;
 
 
 
 
 
 
 On 14/05/12 13:49, Alexey Kardashevskiy wrote:
> On 12/05/12 00:13, Alexander Graf wrote:
>> 
>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>> 
>>> 11.05.2012 20:52, Alexander Graf написал:
 
 On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
 
> Normally the pci_add_capability is called on devices to add new
> capability. This is ok for emulated devices which capabilities list
> is being built by QEMU.
> 
> In the case of VFIO the capability may already exist and adding new
> capability into the beginning of the linked list may create a loop.
> 
> For example, the old code destroys the following config
> of PCIe Intel E1000E:
> 
> before adding PCI_CAP_ID_MSI (0x05):
> 0x34: 0xC8
> 0xC8: 0x01 0xD0
> 0xD0: 0x05 0xE0
> 0xE0: 0x10 0x00
> 
> after:
> 0x34: 0xD0
> 0xC8: 0x01 0xD0
> 0xD0: 0x05 0xC8
> 0xE0: 0x10 0x00
> 
> As result capabilities 0x01 and 0x05 point to each other.
> 
> The proposed patch does not change capability pointers when
> the same type capability is about to add.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> hw/pci.c |   10 ++
> 1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index aa0c0b8..1f7c924 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, 
> uint8_t cap_id,
>  }
> 
>  config = pdev->config + offset;
> -config[PCI_CAP_LIST_ID] = cap_id;
> -config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> -pdev->config[PCI_CAPABILITY_LIST] = offset;
> -pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> +if (config[PCI_CAP_LIST

Re: [Qemu-devel] qemu-microblaze-system

2012-05-21 Thread Peter Crosthwaite
Hi Michael,

The microblaze linux kernel is available at:

http://wiki.xilinx.com/

If you wish to do your own kernel config from scratch. There is some
brief documentation there on how to do a bringup.

The qemu port for mb maintained by PetaLogix and I notice you are
using a student email domain, so you likely eligible for a free
academic license of the petalogix tools which automate the board
bringup/kernel config etc, (and it will include our version of QEMU
which can boot a lot more FPGA-microblaze platforms than just the
ml605 ans s3adsp):

www.petalogix.com

I have a Xilinx XPS project that exactly matches the ML605 qemu
platform that I can send you as well, and I can also send you raw
kernel config if that helps?

Peter

On Mon, May 14, 2012 at 7:42 PM, Michael Trimarchi
 wrote:
> Hi all,
>
> I would like to start using/testing the qemu microblaze system, but I don't 
> find information about linux kernel configuration that match the ml605 
> hardware design? Is there any howto?
>
> Michael
>



Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function

2012-05-21 Thread Michael S. Tsirkin
On Tue, May 22, 2012 at 08:56:12AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2012-05-22 at 01:22 +0300, Michael S. Tsirkin wrote:
> > > Again, from which originator ? From a given initiator, nothing
> > bypasses
> > > anything, so the right thing to do here is a full mb(). However, I
> > > suspect what you are talking about here is read -responses- not
> > > bypassing writes in the direction of the response (ie, the
> > "flushing"
> > > semantic of reads) which is a different matter.
> > 
> > No. My spec says:
> > A3, A4
> > A Posted Request must be able to pass Non-Posted Requests to avoid
> > deadlocks.
> 
> Right, a read + write can become write + read at the target, I forgot
> about that, or you can deadlock due to the flush semantics, but a write
> + read must remain in order or am I missing something ?

Exactly.

> And write + read afaik is typically the one that x86 can re-order
> without a barrier isn't it ?

AFAIK without a barrier, x86 can reorder them however you initiate them.

-- 
MST



Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions

2012-05-21 Thread Benjamin Herrenschmidt
On Tue, 2012-05-22 at 14:34 +1000, Benjamin Herrenschmidt wrote:
> So here's the latest try :-) I've kept is simple, I don't
> add anything to map/unmap at this stage, so we might still
> have a problem with drivers who do that a lot without any
> explicit barrier (ahci ?). My preference is to also add
> barriers to map/unmap by default but we can discuss it.
> 
> Note that I've put the barrier in an inline "helper" so
> we can nag about the type of barriers, or try to be smart,
> or use flags in the DMAContext or whatever we want reasonably
> easily as we have a single spot to modify.
> 
> I'm now going to see if I can measure a performance hit on
> my x86 laptop, but if somebody who has existing x86 guest setups
> wants to help, that would be much welcome :-) 

Also an idea I had to make it easier & avoid a clutter of
_relaxed variants of everything:

Can we change DMADirection to be a DMAAttributes or DMAFlags,
and basically have more than just the direction in there ?

That way we can easily have "relaxed" be a flag, and thus
apply to most of the accessors without adding a bunch of
different variant (especially since we already have that
"cancel" variant for map, I don't want to make 2 more
for _relaxed).

If I do that change I'd also like to change the enum so that
read and write are distinct bits, and so we can set both for
bidirectional. This makes more sense if we ever have to play
with cache issues etc... and might help using more optimal
barriers. It might also fit better to IOMMUs who provide
distinct read vs. write permissions.

Cheers,
Ben.





Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function

2012-05-21 Thread Rusty Russell
On Tue, 22 May 2012 07:58:17 +1000, Benjamin Herrenschmidt 
 wrote:
> On Mon, 2012-05-21 at 15:18 +0300, Michael S. Tsirkin wrote:
> > But I can also live with a global flag "latest_dma_read"
> > and on read we could do
> > if (unlikely(latest_dma_read))
> > smp_mb();
> > 
> > if you really insist on it
> > though I do think it's inelegant.
> 
> Again, why do you object on simply making the default accessors fully
> ordered ? Do you think it will be a measurable different in most cases ?
> 
> Shouldn't we measure it first ?

Yes.  It seems clear to me that qemu's default DMA operations should be
strictly ordered.  It's just far easier to get right.

After that, we can get tricky with conditional barriers, and we can get
tricky with using special unordered variants in critical drivers, but I
really don't want to be chasing subtle SMP ordering problems in
production.

If you're working on ARM or PPC, "it works for x86" makes it *worse*,
not better, since you have fewer users to find bugs.

Cheers,
Rusty.




[Qemu-devel] [PATCH] Add a memory barrier to DMA functions

2012-05-21 Thread Benjamin Herrenschmidt
The emulated devices can run simultaneously with the guest, so
we need to be careful with ordering of load and stores done by
them to the guest system memory, which need to be observed in
the right order by the guest operating system.

This adds a barrier call to the basic DMA read/write ops which
is currently implemented as a smp_mb(), but could be later
improved for more fine grained control of barriers.

Additionally, a _relaxed() variant of the accessors is provided
to easily convert devices who would be performance sensitive
and negatively impacted by the change.

Signed-off-by: Benjamin Herrenschmidt 
---

So here's the latest try :-) I've kept is simple, I don't
add anything to map/unmap at this stage, so we might still
have a problem with drivers who do that a lot without any
explicit barrier (ahci ?). My preference is to also add
barriers to map/unmap by default but we can discuss it.

Note that I've put the barrier in an inline "helper" so
we can nag about the type of barriers, or try to be smart,
or use flags in the DMAContext or whatever we want reasonably
easily as we have a single spot to modify.

I'm now going to see if I can measure a performance hit on
my x86 laptop, but if somebody who has existing x86 guest setups
wants to help, that would be much welcome :-)

 dma.h |   54 --
 1 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/dma.h b/dma.h
index f1fcb71..0d57e50 100644
--- a/dma.h
+++ b/dma.h
@@ -13,6 +13,7 @@
 #include 
 #include "hw/hw.h"
 #include "block.h"
+#include "kvm.h"
 
 typedef struct DMAContext DMAContext;
 typedef struct ScatterGatherEntry ScatterGatherEntry;
@@ -70,6 +71,30 @@ typedef struct DMAContext {
 DMAUnmapFunc *unmap;
 } DMAContext;
 
+static inline void dma_barrier(DMAContext *dma, DMADirection dir)
+{
+/*
+ * This is called before DMA read and write operations
+ * unless the _relaxed form is used and is responsible
+ * for providing some sane ordering of accesses vs
+ * concurrently running VCPUs.
+ *
+ * Users of map(), unmap() or lower level st/ld_*
+ * operations are responsible for providing their own
+ * ordering via barriers.
+ *
+ * This primitive implementation does a simple smp_mb()
+ * before each operation which provides pretty much full
+ * ordering.
+ *
+ * A smarter implementation can be devised if needed to
+ * use lighter barriers based on the direction of the
+ * transfer, the DMA context, etc...
+ */
+if (kvm_enabled())
+smp_mb();
+}
+
 static inline bool dma_has_iommu(DMAContext *dma)
 {
 return !!dma;
@@ -93,8 +118,9 @@ static inline bool dma_memory_valid(DMAContext *dma,
 
 int iommu_dma_memory_rw(DMAContext *dma, dma_addr_t addr,
 void *buf, dma_addr_t len, DMADirection dir);
-static inline int dma_memory_rw(DMAContext *dma, dma_addr_t addr,
-void *buf, dma_addr_t len, DMADirection dir)
+static inline int dma_memory_rw_relaxed(DMAContext *dma, dma_addr_t addr,
+void *buf, dma_addr_t len,
+DMADirection dir)
 {
 if (!dma_has_iommu(dma)) {
 /* Fast-path for no IOMMU */
@@ -106,6 +132,28 @@ static inline int dma_memory_rw(DMAContext *dma, 
dma_addr_t addr,
 }
 }
 
+static inline int dma_memory_read_relaxed(DMAContext *dma, dma_addr_t addr,
+  void *buf, dma_addr_t len)
+{
+return dma_memory_rw_relaxed(dma, addr, buf, len, DMA_DIRECTION_TO_DEVICE);
+}
+
+static inline int dma_memory_write_relaxed(DMAContext *dma, dma_addr_t addr,
+   const void *buf, dma_addr_t len)
+{
+return dma_memory_rw_relaxed(dma, addr, (void *)buf, len,
+ DMA_DIRECTION_FROM_DEVICE);
+}
+
+static inline int dma_memory_rw(DMAContext *dma, dma_addr_t addr,
+void *buf, dma_addr_t len,
+DMADirection dir)
+{
+dma_barrier(dma, dir);
+
+return dma_memory_rw_relaxed(dma, addr, buf, len, dir);
+}
+
 static inline int dma_memory_read(DMAContext *dma, dma_addr_t addr,
   void *buf, dma_addr_t len)
 {
@@ -124,6 +172,8 @@ int iommu_dma_memory_set(DMAContext *dma, dma_addr_t addr, 
uint8_t c,
 static inline int dma_memory_set(DMAContext *dma, dma_addr_t addr,
  uint8_t c, dma_addr_t len)
 {
+dma_barrier(dma, DMA_DIRECTION_FROM_DEVICE);
+
 if (!dma_has_iommu(dma)) {
 /* Fast-path for no IOMMU */
 cpu_physical_memory_set(addr, c, len);





Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-21 Thread Alexey Kardashevskiy
On 22/05/12 13:21, Alexander Graf wrote:
> 
> 
> On 22.05.2012, at 04:02, Benjamin Herrenschmidt  
> wrote:
> 
>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>> Alexander,
>>>
>>> Is that any better? :)
>>
>> Alex (Graf that is), ping ?
>>
>> The original patch from Alexey was fine btw.
>>
>> VFIO will always call things with the existing capability offset so
>> there's no real risk of doing the wrong thing or break the list or
>> anything.
>>
>> IE. A small simple patch that addresses the problem :-)
>>
>> The new patch is a bit more "robust" I believe, I don't think we need to
>> go too far to fix a problem we don't have. But we need a fix for the
>> real issue and the simple patch does it neatly from what I can
>> understand.
>>
>> Cheers,
>> Ben.
>>
>>>
>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>  * in pci config space */
>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>uint8_t offset, uint8_t size)
>>> {
>>> -uint8_t *config;
>>> +uint8_t *config, existing;
> 
> Existing is a pointer to the target dev's config space, right?

Yes.

>>> int i, overlapping_cap;
>>>
>>> +existing = pci_find_capability(pdev, cap_id);
>>> +if (existing) {
>>> +if (offset && (existing != offset)) {
>>> +return -EEXIST;
>>> +}
>>> +for (i = existing; i < size; ++i) {
> 
> So how does this possibly make sense?

Although I do not expect VFIO to add capabilities (does not make sense), I 
still want to double
check that this space has not been tried to use by someone else.

>>> +if (pdev->used[i]) {
>>> +return -EFAULT;
>>> +}
>>> +}
>>> +memset(pdev->used + offset, 0xFF, size);
> Why?

Because I am marking the space this capability takes as used.

>>> +/* Make capability read-only by default */
>>> +memset(pdev->wmask + offset, 0, size);
> Why?

Because the pci_add_capability() does it for a new capability by default.


>>> +/* Check capability by default */
>>> +memset(pdev->cmask + offset, 0xFF, size);
> 
> I don't understand this part either.

The pci_add_capability() does it for a new capability by default.



> 
> Alex
> 
>>> +return existing;
>>> +}
>>> +
>>> if (!offset) {
>>> offset = pci_find_space(pdev, size);
>>> if (!offset) {
>>> return -ENOSPC;
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
 On 12/05/12 00:13, Alexander Graf wrote:
>
> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>
>> 11.05.2012 20:52, Alexander Graf написал:
>>>
>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>
 Normally the pci_add_capability is called on devices to add new
 capability. This is ok for emulated devices which capabilities list
 is being built by QEMU.

 In the case of VFIO the capability may already exist and adding new
 capability into the beginning of the linked list may create a loop.

 For example, the old code destroys the following config
 of PCIe Intel E1000E:

 before adding PCI_CAP_ID_MSI (0x05):
 0x34: 0xC8
 0xC8: 0x01 0xD0
 0xD0: 0x05 0xE0
 0xE0: 0x10 0x00

 after:
 0x34: 0xD0
 0xC8: 0x01 0xD0
 0xD0: 0x05 0xC8
 0xE0: 0x10 0x00

 As result capabilities 0x01 and 0x05 point to each other.

 The proposed patch does not change capability pointers when
 the same type capability is about to add.

 Signed-off-by: Alexey Kardashevskiy 
 ---
 hw/pci.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

 diff --git a/hw/pci.c b/hw/pci.c
 index aa0c0b8..1f7c924 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, 
 uint8_t cap_id,
   }

   config = pdev->config + offset;
 -config[PCI_CAP_LIST_ID] = cap_id;
 -config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
 -pdev->config[PCI_CAPABILITY_LIST] = offset;
 -pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
 +if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>
>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do 
>>> a loop through all capabilities, check if the one you want to add is 
>>> there already and if so either
>>> * replace the existing one or
>>> * drop out and not write the new one in.
>
>  * hw_error :)
>
>>>
>>> I'm not sure which way would be more natural.
>>
>> There is a third option - add another function, lets call it
>> pci_fixup_capability() which would do whatever pci_add_capability

Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-21 Thread Alexander Graf


On 22.05.2012, at 04:02, Benjamin Herrenschmidt  
wrote:

> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>> Alexander,
>> 
>> Is that any better? :)
> 
> Alex (Graf that is), ping ?
> 
> The original patch from Alexey was fine btw.
> 
> VFIO will always call things with the existing capability offset so
> there's no real risk of doing the wrong thing or break the list or
> anything.
> 
> IE. A small simple patch that addresses the problem :-)
> 
> The new patch is a bit more "robust" I believe, I don't think we need to
> go too far to fix a problem we don't have. But we need a fix for the
> real issue and the simple patch does it neatly from what I can
> understand.
> 
> Cheers,
> Ben.
> 
>> 
>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>  * in pci config space */
>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>uint8_t offset, uint8_t size)
>> {
>> -uint8_t *config;
>> +uint8_t *config, existing;

Existing is a pointer to the target dev's config space, right?

>> int i, overlapping_cap;
>> 
>> +existing = pci_find_capability(pdev, cap_id);
>> +if (existing) {
>> +if (offset && (existing != offset)) {
>> +return -EEXIST;
>> +}
>> +for (i = existing; i < size; ++i) {

So how does this possibly make sense?

>> +if (pdev->used[i]) {
>> +return -EFAULT;
>> +}
>> +}
>> +memset(pdev->used + offset, 0xFF, size);

Why?

>> +/* Make capability read-only by default */
>> +memset(pdev->wmask + offset, 0, size);

Why?

>> +/* Check capability by default */
>> +memset(pdev->cmask + offset, 0xFF, size);

I don't understand this part either.


Alex

>> +return existing;
>> +}
>> +
>> if (!offset) {
>> offset = pci_find_space(pdev, size);
>> if (!offset) {
>> return -ENOSPC;
>> 
>> 
>> 
>> 
>> 
>> 
>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>> On 12/05/12 00:13, Alexander Graf wrote:
 
 On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
 
> 11.05.2012 20:52, Alexander Graf написал:
>> 
>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>> 
>>> Normally the pci_add_capability is called on devices to add new
>>> capability. This is ok for emulated devices which capabilities list
>>> is being built by QEMU.
>>> 
>>> In the case of VFIO the capability may already exist and adding new
>>> capability into the beginning of the linked list may create a loop.
>>> 
>>> For example, the old code destroys the following config
>>> of PCIe Intel E1000E:
>>> 
>>> before adding PCI_CAP_ID_MSI (0x05):
>>> 0x34: 0xC8
>>> 0xC8: 0x01 0xD0
>>> 0xD0: 0x05 0xE0
>>> 0xE0: 0x10 0x00
>>> 
>>> after:
>>> 0x34: 0xD0
>>> 0xC8: 0x01 0xD0
>>> 0xD0: 0x05 0xC8
>>> 0xE0: 0x10 0x00
>>> 
>>> As result capabilities 0x01 and 0x05 point to each other.
>>> 
>>> The proposed patch does not change capability pointers when
>>> the same type capability is about to add.
>>> 
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>> hw/pci.c |   10 ++
>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/hw/pci.c b/hw/pci.c
>>> index aa0c0b8..1f7c924 100644
>>> --- a/hw/pci.c
>>> +++ b/hw/pci.c
>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t 
>>> cap_id,
>>>   }
>>> 
>>>   config = pdev->config + offset;
>>> -config[PCI_CAP_LIST_ID] = cap_id;
>>> -config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>> -pdev->config[PCI_CAPABILITY_LIST] = offset;
>>> -pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>> +if (config[PCI_CAP_LIST_ID] != cap_id) {
>> 
>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a 
>> loop through all capabilities, check if the one you want to add is there 
>> already and if so either
>> * replace the existing one or
>> * drop out and not write the new one in.
 
  * hw_error :)
 
>> 
>> I'm not sure which way would be more natural.
> 
> There is a third option - add another function, lets call it
> pci_fixup_capability() which would do whatever pci_add_capability() does
> but won't touch list pointers.
 
 What good is a function that breaks internal consistency?
>>> 
>>> 
>>> It is broken already by having PCIDevice.used field. Normally 
>>> pci_add_capability() would go through
>>> the whole list and add a capability if it does not exist. Emulated devices 
>>> which care about having a
>>> capability at some fixed offset would have initialized their config space 
>>> before calling this
>>> capabilities API (as VFIO does).
>>> 
>>> If we really want to support emulated devices which 

Re: [Qemu-devel] Is it possible to retrieve pre-process information in QEMU?

2012-05-21 Thread 陳韋任
> >  I would like to know if I can retrieve pre-process information in QEMU
> > system mode. For example, I want to know each process's page fault ratio.
> > Is there a way to do that?
> 
> logically, it's possible, but you need to locate the task_struct of
> each processes first. Using GDB, that might be a bit easier but still
> not easy.
> 
> Why not just monitor it inside the guest? using system tap for example?

  O.K., what I did is something like below,

---
void tlb_fill(CPUARMState *env1, target_ulong addr, int is_write, int mmu_idx,
  uintptr_t retaddr)
{
ret = cpu_arm_handle_mmu_fault(env, addr, is_write, mmu_idx);
if (unlikely(ret)) {
page_fault++;   // page fault
}
env = saved_env;
}
---

  IIUC, cpu_arm_handle_mmu_fault will lookup guest page table, return 1 if there
is a page fault, that's why I add a counter there. But this way I'll collect a
global static not per-process one. I'll try systemtap latter but not sure it
does the same thing I want to do.

  Any thoughts? :)

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj



Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

2012-05-21 Thread Benjamin Herrenschmidt
On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
> Alexander,
> 
> Is that any better? :)

Alex (Graf that is), ping ?

The original patch from Alexey was fine btw.

VFIO will always call things with the existing capability offset so
there's no real risk of doing the wrong thing or break the list or
anything.

IE. A small simple patch that addresses the problem :-)

The new patch is a bit more "robust" I believe, I don't think we need to
go too far to fix a problem we don't have. But we need a fix for the
real issue and the simple patch does it neatly from what I can
understand.

Cheers,
Ben.

> 
> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>   * in pci config space */
>  int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
> uint8_t offset, uint8_t size)
>  {
> -uint8_t *config;
> +uint8_t *config, existing;
>  int i, overlapping_cap;
> 
> +existing = pci_find_capability(pdev, cap_id);
> +if (existing) {
> +if (offset && (existing != offset)) {
> +return -EEXIST;
> +}
> +for (i = existing; i < size; ++i) {
> +if (pdev->used[i]) {
> +return -EFAULT;
> +}
> +}
> +memset(pdev->used + offset, 0xFF, size);
> +/* Make capability read-only by default */
> +memset(pdev->wmask + offset, 0, size);
> +/* Check capability by default */
> +memset(pdev->cmask + offset, 0xFF, size);
> +return existing;
> +}
> +
>  if (!offset) {
>  offset = pci_find_space(pdev, size);
>  if (!offset) {
>  return -ENOSPC;
> 
> 
> 
> 
> 
> 
> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
> > On 12/05/12 00:13, Alexander Graf wrote:
> >>
> >> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
> >>
> >>> 11.05.2012 20:52, Alexander Graf написал:
> 
>  On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
> 
> > Normally the pci_add_capability is called on devices to add new
> > capability. This is ok for emulated devices which capabilities list
> > is being built by QEMU.
> >
> > In the case of VFIO the capability may already exist and adding new
> > capability into the beginning of the linked list may create a loop.
> >
> > For example, the old code destroys the following config
> > of PCIe Intel E1000E:
> >
> > before adding PCI_CAP_ID_MSI (0x05):
> > 0x34: 0xC8
> > 0xC8: 0x01 0xD0
> > 0xD0: 0x05 0xE0
> > 0xE0: 0x10 0x00
> >
> > after:
> > 0x34: 0xD0
> > 0xC8: 0x01 0xD0
> > 0xD0: 0x05 0xC8
> > 0xE0: 0x10 0x00
> >
> > As result capabilities 0x01 and 0x05 point to each other.
> >
> > The proposed patch does not change capability pointers when
> > the same type capability is about to add.
> >
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> > hw/pci.c |   10 ++
> > 1 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/pci.c b/hw/pci.c
> > index aa0c0b8..1f7c924 100644
> > --- a/hw/pci.c
> > +++ b/hw/pci.c
> > @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t 
> > cap_id,
> >}
> >
> >config = pdev->config + offset;
> > -config[PCI_CAP_LIST_ID] = cap_id;
> > -config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> > -pdev->config[PCI_CAPABILITY_LIST] = offset;
> > -pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> > +if (config[PCI_CAP_LIST_ID] != cap_id) {
> 
>  This doesn't scale. Capabilities are a list of CAPs. You'll have to do a 
>  loop through all capabilities, check if the one you want to add is there 
>  already and if so either
>   * replace the existing one or
>   * drop out and not write the new one in.
> >>
> >>   * hw_error :)
> >>
> 
>  I'm not sure which way would be more natural.
> >>>
> >>> There is a third option - add another function, lets call it
> >>> pci_fixup_capability() which would do whatever pci_add_capability() does
> >>> but won't touch list pointers.
> >>
> >> What good is a function that breaks internal consistency?
> > 
> > 
> > It is broken already by having PCIDevice.used field. Normally 
> > pci_add_capability() would go through
> > the whole list and add a capability if it does not exist. Emulated devices 
> > which care about having a
> > capability at some fixed offset would have initialized their config space 
> > before calling this
> > capabilities API (as VFIO does).
> > 
> > If we really want to support emulated devices which want some capabilities 
> > be at fixed offset and
> > others at random offsets (strange, but ok), I do not see how it is bad to 
> > restore this consistency
> > by special function (pci_fixup_capability()) to avoid its rewriting at 
> > different location as a guest
> > driver may care about its offset.
> > 

Re: [Qemu-devel] The image size of instance VM keeps growing

2012-05-21 Thread Charles . Tsai-蔡清海-研究發展部
Stefan,

Thank you for information. We will try to isolate the issue here and give you 
an update.


-Original Message-
From: Stefan Hajnoczi [mailto:stefa...@gmail.com] 
Sent: Monday, May 21, 2012 9:24 PM
To: Charles.Tsai-蔡清海-研究發展部
Cc: qemu-devel@nongnu.org; Jonah.Wu-吳君勉-研究發展部
Subject: Re: [Qemu-devel] The image size of instance VM keeps growing

On Mon, May 21, 2012 at 10:08 AM, Charles.Tsai-蔡清海-研究發展部
 wrote:
> We can run a tool to see if there is any active I/O or not. Does the image 
> size shrink back if an I/O activity stops?
> What we found here is that the image size kept growing. If this problem 
> persists, it could eat up the entire disk space.

It's possible for an image to grow larger than its backing file due to the 
layout and metadata of qcow2 (or any image file format).  But this overhead 
should be fixed to maximum 1 MB, maybe (assuming you are not using qcow2 
snapshots where the file can grow arbitrarily).

If the image file is growing endlessly then there is a problem :).
The starting point is figuring out which I/O request causes this to happen so 
it can be reproduced and debugged.  Can you isolate the problem to a few steps 
that can be reproduced?

Stefan


Re: [Qemu-devel] [PATCH next v2 00/74] QOM CPUState, part 3: CPU reset

2012-05-21 Thread Andreas Färber
Am 21.05.2012 20:20, schrieb Blue Swirl:
> On Mon, May 21, 2012 at 9:09 AM, Andreas Färber  wrote:
>> Am 14.05.2012 23:22, schrieb Blue Swirl:
>>> On Mon, May 14, 2012 at 8:59 PM, Andreas Färber  wrote:
 Am 14.05.2012 21:54, schrieb Blue Swirl:
> On Mon, May 14, 2012 at 4:15 PM, Andreas Färber  wrote:
>> Am 10.05.2012 02:13, schrieb Andreas Färber:
>>> Andreas Färber (74):
>> [...]
>>>   target-sparc: Let cpu_sparc_init() return SPARCCPU
>>>   sun4m: Use cpu_sparc_init() to obtain SPARCCPU
>>>   sun4m: Pass SPARCCPU to {main,secondary}_cpu_reset()
>>>   sun4u: Use cpu_sparc_init() to obtain SPARCCPU
>>>   sun4u: Let cpu_devinit() return SPARCCPU
>>>   sun4u: Store SPARCCPU in ResetData
>>>   leon3: Use cpu_sparc_init() to obtain SPARCCPU
>>>   leon3: Store SPARCCPU in ResetData
>> [...]

 Forgot to mention the bsd-user patch.

>>>   Kill off cpu_state_reset()
>>
>> Ping! Blue, can you ack please?
>
> Otherwise the patches look pretty safe ("if it compiles, it works").
>>
>> Was I supposed to interpret that as an Acked-by? ;)
> 
> No, but you may do so if it compiles. ;-)

Joking, are we? My cover letter indicated pretty clearly on what
platforms I had already compile-tested on, and that meant each patch.
I had also run `make check` on each and the sparc test image for sun4m,
HelenOS for sun4u and "crible" for leon3.

Applied the remainder of the series to qom-next:
http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/qom-next

Tristan/Fabien, could you please submit a patch to document the Leon3
machine in MAINTAINERS? No one got cc'ed on those two patches. Thanks.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function

2012-05-21 Thread Benjamin Herrenschmidt
On Tue, 2012-05-22 at 01:22 +0300, Michael S. Tsirkin wrote:
> > Again, from which originator ? From a given initiator, nothing
> bypasses
> > anything, so the right thing to do here is a full mb(). However, I
> > suspect what you are talking about here is read -responses- not
> > bypassing writes in the direction of the response (ie, the
> "flushing"
> > semantic of reads) which is a different matter.
> 
> No. My spec says:
> A3, A4
> A Posted Request must be able to pass Non-Posted Requests to avoid
> deadlocks. 

An additional note about that one: It only applies obviously if the
initiator doesn't wait for the read response before shooting the write.

Most devices actually do wait. Here too, we would have to explicitly
make sure if what is the right semantic on an individual device basis.

Ben.





[Qemu-devel] [ANNOUNCE] kvm-kmod-3.4

2012-05-21 Thread Jan Kiszka
After the release of kernel 3.4, this is now the announcements of the
corresponding kvm-kmod-3.4. The package is available from

http://sourceforge.net/projects/kvm/files/kvm-kmod/3.4/kvm-kmod-3.4.tar.bz2/download

See [1] for further details on kvm-kmod.

Major KVM changes since kvm-kmod-3.3.6:
 - Timekeeping improvements
   (specifically: Don't mark TSC unstable due to S4 suspend)
 - Support for assigning host PCI devices that share interrupt lines
   (only usable with host kernels >= 3.3)
 - Fix privilege checks during task switches
 - SVM: Add support for AMD's OSVW feature in guests
 - x86: increase recommended max vcpus to 160
 - VMX: remove yield_on_hlt
 - various smaller fixes

kvm-kmod changes:
 - none

Note that I'm considering to stop distributing updated kernel headers
along with kvm-kmod with the next release. The only known users of this
feature, QEMU, stopped relying on it 2, soon 3, releases ago. I assume
people pulling in latest KVM modules via kvm-kmod will also regularly
update their QEMU version.

Jan

[1] http://www.linux-kvm.org/page/Getting_the_kvm_kernel_modules



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions

2012-05-21 Thread Benjamin Herrenschmidt
On Tue, 2012-05-22 at 01:44 +0300, Michael S. Tsirkin wrote:
> 
> OK. Just not another level of indirect function callbacks please.  Make
> it a library so each bus can do the right thing.  There are not so many
> buses. 

I think we are a long way from having to deal with subtly different
ordering rules, at this stage I'm keen on just sticking it in the inline
dma_ accessor, period.

We can always have flags in the dma context to indicate the sort of
ordering, avoids indirect pointer calls which can be costly (often
mispredicted).

Ben.





Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function

2012-05-21 Thread Benjamin Herrenschmidt
On Tue, 2012-05-22 at 01:22 +0300, Michael S. Tsirkin wrote:
> > Again, from which originator ? From a given initiator, nothing
> bypasses
> > anything, so the right thing to do here is a full mb(). However, I
> > suspect what you are talking about here is read -responses- not
> > bypassing writes in the direction of the response (ie, the
> "flushing"
> > semantic of reads) which is a different matter.
> 
> No. My spec says:
> A3, A4
> A Posted Request must be able to pass Non-Posted Requests to avoid
> deadlocks.

Right, a read + write can become write + read at the target, I forgot
about that, or you can deadlock due to the flush semantics, but a write
+ read must remain in order or am I missing something ?

And write + read afaik is typically the one that x86 can re-order
without a barrier isn't it ?

> > Also don't forget that
> > this is only a semantic of PCI, not of the system fabric, ie, a
> device
> > DMA read doesn't flush a CPU write that is still in that CPU store
> > queue.
> 
> We need to emulate what hardware does IMO.
> So if usb has different rules it needs a different barriers.

Who talks about USB here ? Whatever rules USB has only matter between
the USB device and the USB controller emulation, USB doesn't sit
directly on the memory bus doing DMA, it all goes through the HCI, which
adheres the the ordering rules of whatever bus it sits on.

Here, sanity must prevail :-) I suggest ordering by default...

> > And it's not correct. With that setup, DMA writes can pass DMA reads
> > (and vice-versa) which doesn't correspond to the guarantees of the
> PCI
> > spec.
> 
> Cite the spec please. Express spec matches this at least.

Sure, see above. Yes I did forgot that a read + write could be
re-ordered on PCI but that isn't the case of a write + read, or am I
reading the table sideways ?

> It's a lot of work. We measured the effect for virtio in
> the past. I don't think we need to redo it.

virtio is specifically our high performance case and what I'm proposing
isn't affecting it.

> > > You said above x86 is unaffected. This is portability, not safety.
> > 
> > x86 is unaffected by the missing wmb/rmb, it might not be unaffected
> by
> > the missing ordering between loads and stores, I don't know, as I
> said,
> > I don't fully know the x86 memory model.
> 
> You don't need to understand it. Assume memory-barriers.h is correct.

In which case we still need a full mb() unless we can convince ourselves
that the ordering between a write and a subsequent read can be relaxed
safely and I'm really not sure about it.

> > In any case, opposing "portability" to "safety" the way you do it
> means
> > you are making assumptions that basically "qemu is written for x86
> and
> > nothing else matters".
> 
> No. But find a way to fix power without hurting working setups.

And ARM ;-)

Arguably x86 is wrong too anyway, at least from a strict interpretation
off the spec (and unless I missed something).

> > If that's your point of view, so be it and be clear about it, but I
> will
> > disagree :-) And while I can understand that powerpc might not be
> > considered as the most important arch around at this point in time,
> > these problems are going to affect ARM as well.
> > 
> > > > Almost all our
> > > > devices were written without any thought given to ordering, so
> they
> > > > basically can and should be considered as all broken.
> > > 
> > > Problem is, a lot of code is likely broken even after you sprinkle
> > > barriers around. For example qemu might write A then B where guest
> driver
> > > expects to see B written before A.
> > 
> > No, this is totally unrelated bugs, nothing to do with barriers. You
> are
> > mixing up two completely different problems and using one as an
> excuse
> > to not fix the other one :-)
> > 
> > A device with the above problem would be broken today on x86
> regardless.
> > 
> > > > Since thinking
> > > > about ordering is something that, by experience, very few
> programmer can
> > > > do and get right, the default should absolutely be fully
> ordered.
> > > 
> > > Give it bus ordering. That is not fully ordered.
> > 
> > It pretty much is actually, look at your PCI spec :-)
> 
> I looked. 2.4.1.  Transaction Ordering Rules
> 
> > > > Performance regressions aren't a big deal to bisect in that
> case: If
> > > > there's a regression for a given driver and it points to this
> specific
> > > > patch adding the barriers then we know precisely where the
> regression
> > > > come from, and we can get some insight about how this specific
> driver
> > > > can be improved to use more relaxed accessors.
> > > > 
> > > > I don't see the problem.
> > > > 
> > > > One thing that might be worth looking at is if indeed mb() is so
> much
> > > > more costly than just wmb/rmb, in which circumstances we could
> have some
> > > > smarts in the accessors to make them skip the full mb based on
> knowledge
> > > > of previous access direction, though here too I would be tempted
> to only
>

[Qemu-devel] [Bug 990364] Re: virtio_ioport_write: unexpected address 0x13 value 0x1

2012-05-21 Thread Vitalis
But with new drivers i got "virtio_ioport_write: unexpected address 0x13
value 0x1" again.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/990364

Title:
  virtio_ioport_write: unexpected address 0x13 value 0x1

Status in QEMU:
  New

Bug description:
  Hello! I have:

  virtio_ioport_write: unexpected address 0x13 value 0x1

  on config:

  LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -cpu qemu32 -enable-kvm -m 3072 
-smp 1 -name nata_xp -uuid da607499-1d8f-e7ef-d1d2-38
  1c1839e4ba -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/nata_xp.monitor,server,nowait 
-monitor chardev:monitor -localtime -boot c -drive 
file=/root/nata_xp.qcow2,if=virtio,index=0,boot=on,format=raw
  ,cache=none -drive 
file=/home/admino/virtio-win-0.1-22.iso,if=ide,media=cdrom,index=2,format=raw 
-net nic,macaddr=00:16:36:06:02:69,vlan=0,model=virtio,name=virtio.0 -net 
tap,fd=43,vlan=0,name=tap.0 -serial
  none -parallel none -usb -usbdevice tablet -vnc 127.0.0.1:3 -k en-us -vga 
cirrus
  pci_add_option_rom: failed to find romfile "pxe-virtio.bin"

  with kernel 2.6.32-40-generic #87-Ubuntu SMP Tue Mar 6 00:56:56 UTC 2012 
x86_64 GNU/Linux
  qemu drivers are virtio-win-0.1-22.iso
  kvm version 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9.18
  qemu 0.12.3+noroms-0ubuntu9.18

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/990364/+subscriptions



Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions

2012-05-21 Thread Michael S. Tsirkin
On Mon, May 21, 2012 at 05:31:06PM -0500, Anthony Liguori wrote:
> On 05/21/2012 05:26 PM, Benjamin Herrenschmidt wrote:
> >On Mon, 2012-05-21 at 17:18 -0500, Anthony Liguori wrote:
> >>But this isn't what this series is about.
> >>
> >>This series is only attempting to make sure that writes are ordered
> >>with respect
> >>to other writes in main memory.
> >
> >Actually, it applies to both reads and writes. They can't pass each
> >other either and that can be fairly important.
> 
> That's fine but that's a detail of the bus.
> 
> >It's in fact the main contention point because if it was only writes we
> >could just use wmb and be done with it (that's a nop on x86).
> >
> >Because we are trying to order everything (and specifically store
> >followed by a load), we need a full barrier which is more expensive on
> >x86.
> 
> I think the thing to do is make the barrier implemented in the dma
> API and allow it to be overridden by the bus.  The default
> implementation should be a full barrier.

I think what's called for is what Ben proposed: track
last transaction and use the appropriate barrier.

> If we can establish that the bus guarantees a weaker ordering
> guarantee, a bus could override the default implementation and do
> something weaker.
> 
> Regards,
> 
> Anthony Liguori

OK. Just not another level of indirect function callbacks please.  Make
it a library so each bus can do the right thing.  There are not so many
buses.

-- 
MST



Re: [Qemu-devel] [PATCH next v2 00/74] QOM CPUState, part 3: CPU reset

2012-05-21 Thread Andreas Färber
Am 21.05.2012 15:52, schrieb Alexander Graf:
> Very simple, nice and clean ;)
> 
> PowerPC parts are:
> 
>   Acked-by: Alexander Graf 

Thanks, applied to qom-next:
http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/qom-next

David and Edgar, you two were apparently not cc'ed. Could you submit
patches to document the pseries and virtex-ml507 machines respectively
in MAINTAINERS? Thanks.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions

2012-05-21 Thread Michael S. Tsirkin
On Mon, May 21, 2012 at 05:18:21PM -0500, Anthony Liguori wrote:
> On 05/21/2012 03:31 AM, Michael S. Tsirkin wrote:
> >More than that. smp_mb is pretty expensive. You
> >often can do just smp_wmb and smp_rmb and that is
> >very cheap.
> >Many operations run in the vcpu context
> >or start when guest exits to host and work
> >is bounced from there and thus no barrier is needed
> >here.
> >
> >Example? start_xmit in e1000. Executed in vcpu context
> >so no barrier is needed.
> >
> >virtio of course is another example since it does its own
> >barriers. But even without that, virtio_blk_handle_output
> >runs in vcpu context.
> >
> >But more importantly, this hack just sweeps the
> >dirt under the carpet. Understanding the interaction
> >with guest drivers is important anyway. So
> 
> But this isn't what this series is about.
> 
> This series is only attempting to make sure that writes are ordered
> with respect to other writes in main memory.

They it should use smp_wmb() not smp_mb().
I would be 100% fine with that.

> It's based on the assumption that write ordering is well defined
> (and typically strict) on most busses including PCI.  I have not
> confirmed this myself but I trust that Ben has.
> 
> So the only problem trying to be solved here is to make sure that if
> a write A is issued by the device model while it's on PCPU 0, if
> PCPU 1 does a write B to another location, and then the device model
> runs on PCPU 2 and does a read of both A and B, it will only see the
> new value of B if the it sees the new value of A.
> 
> Whether the driver on VCPU 0 (which may be on any PCPU) also sees
> the write ordering is irrelevant.
> 
> If you want to avoid taking a barrier on every write, we can make
> use of map() and issue explicit barriers (as virtio does).
> 
> Regards,
> 
> Anthony Liguori
> 
> >I really don't see why don't we audit devices
> >and add proper barriers.
> >



Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions

2012-05-21 Thread Anthony Liguori

On 05/21/2012 05:26 PM, Benjamin Herrenschmidt wrote:

On Mon, 2012-05-21 at 17:18 -0500, Anthony Liguori wrote:

But this isn't what this series is about.

This series is only attempting to make sure that writes are ordered
with respect
to other writes in main memory.


Actually, it applies to both reads and writes. They can't pass each
other either and that can be fairly important.


That's fine but that's a detail of the bus.


It's in fact the main contention point because if it was only writes we
could just use wmb and be done with it (that's a nop on x86).

Because we are trying to order everything (and specifically store
followed by a load), we need a full barrier which is more expensive on
x86.


I think the thing to do is make the barrier implemented in the dma API and allow 
it to be overridden by the bus.  The default implementation should be a full 
barrier.


If we can establish that the bus guarantees a weaker ordering guarantee, a bus 
could override the default implementation and do something weaker.


Regards,

Anthony Liguori



Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions

2012-05-21 Thread Benjamin Herrenschmidt
On Mon, 2012-05-21 at 17:18 -0500, Anthony Liguori wrote:
> But this isn't what this series is about.
> 
> This series is only attempting to make sure that writes are ordered
> with respect 
> to other writes in main memory.

Actually, it applies to both reads and writes. They can't pass each
other either and that can be fairly important.

It's in fact the main contention point because if it was only writes we
could just use wmb and be done with it (that's a nop on x86).

Because we are trying to order everything (and specifically store
followed by a load), we need a full barrier which is more expensive on
x86.

Ben.

> It's based on the assumption that write ordering is well defined (and
> typically 
> strict) on most busses including PCI.  I have not confirmed this
> myself but I 
> trust that Ben has.
> 
> So the only problem trying to be solved here is to make sure that if a
> write A 
> is issued by the device model while it's on PCPU 0, if PCPU 1 does a
> write B to 
> another location, and then the device model runs on PCPU 2 and does a
> read of 
> both A and B, it will only see the new value of B if the it sees the
> new value of A.
> 
> Whether the driver on VCPU 0 (which may be on any PCPU) also sees the
> write 
> ordering is irrelevant.
> 
> If you want to avoid taking a barrier on every write, we can make use
> of map() 
> and issue explicit barriers (as virtio does).
> 
> 




Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function

2012-05-21 Thread Michael S. Tsirkin
On Tue, May 22, 2012 at 07:58:17AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2012-05-21 at 15:18 +0300, Michael S. Tsirkin wrote:
> 
> > > mb is more than just "read flush writes" (besides it's not a statement
> > > about flushing, it's a statement about ordering. whether it has a
> > > flushing side effect on x86 is a separate issue, it doesn't on power for
> > > example).
> > 
> > I referred to reads not bypassing writes on PCI.
> 
> Again, from which originator ? From a given initiator, nothing bypasses
> anything, so the right thing to do here is a full mb(). However, I
> suspect what you are talking about here is read -responses- not
> bypassing writes in the direction of the response (ie, the "flushing"
> semantic of reads) which is a different matter.

No. My spec says:
A3, A4
A Posted Request must be able to pass Non-Posted Requests to avoid
deadlocks.


> Also don't forget that
> this is only a semantic of PCI, not of the system fabric, ie, a device
> DMA read doesn't flush a CPU write that is still in that CPU store
> queue.

We need to emulate what hardware does IMO.
So if usb has different rules it needs a different barriers.


> > This is the real argument in my eyes: that we
> > should behave the way real hardware does.
> 
> But that doesn't really make much sense since we don't actually have a
> non-coherent bus sitting in the middle :-)
> 
> However we should as much as possible be observed to behave as such, I
> agree, though I don't think we need to bother too much about timings
> since we don't really have way to enforce the immediate visibility of
> stores within the coherent domain without a bunch of arch specific very
> very heavy hammers which we really don't want to wield at this point.
> 
> > > Real flushing out writes matters very much in real life in two very
> > > different contexts that tend to not affect emulation in qemu as much.
> > > 
> > > One is flushing write in the opposite direction (or rather, having the
> > > read response queued up behind those writes) which is critical to
> > > ensuring proper completion of DMAs after an LSI from a guest driver
> > > perspective on real HW typically.
> > > 
> > > The other classic case is to flush posted MMIO writes in order to ensure
> > > that a subsequent delay is respected.
> > > 
> > > Most of those don't actually matter when doing emulation. Besides a
> > > barrier won't provide you the second guarantee, you need a nastier
> > > construct at least on some architectures like power.
> > 
> > Exactly. This is what I was saying too.
> 
> Right and I'm reasonably sure that none of those above is our problem. 
> 
> As I said, at this point, what I want to sort out is purely the
> observable ordering of DMA transactions. The side effect of reads in one
> direction on writes in the other direction is an orthogonal problem
> which as I wrote above is probably not hurting us.
> 
> > > However, we do need to ensure that read and writes are properly ordered
> > > vs. each other (regardless of any "flush" semantic) or things could go
> > > very wrong on OO architectures (here too, my understanding on x86 is
> > > limited).
> > 
> > Right. Here's a compromize:
> > - add smp_rmb() on any DMA read
> > - add smp_wmb( on any DMA write
> > This is almost zero cost on x86 at least.
> > So we are not regressing existing setups.
> 
> And it's not correct. With that setup, DMA writes can pass DMA reads
> (and vice-versa) which doesn't correspond to the guarantees of the PCI
> spec.

Cite the spec please. Express spec matches this at least.

> The question I suppose is whether this is a problem in practice...
> 
> > Are there any places where devices do read after write?
> 
> It's possible, things like update of a descriptor followed by reading of
> the next one, etc...  I don't have an example hot in mind right know of
> a device that would be hurt but I'm a bit nervous as this would be a
> violation of the PCI guaranteed ordering.
> 
> > My preferred way is to find them and do pci_dma_flush() invoking
> > smp_mb(). If there is such a case it's likely on datapath anyway
> > so we do care.
> > 
> > But I can also live with a global flag "latest_dma_read"
> > and on read we could do
> > if (unlikely(latest_dma_read))
> > smp_mb();
> > 
> > if you really insist on it
> > though I do think it's inelegant.
> 
> Again, why do you object on simply making the default accessors fully
> ordered ? Do you think it will be a measurable different in most cases ?
> 
> Shouldn't we measure it first ?

It's a lot of work. We measured the effect for virtio in
the past. I don't think we need to redo it.

> > You said above x86 is unaffected. This is portability, not safety.
> 
> x86 is unaffected by the missing wmb/rmb, it might not be unaffected by
> the missing ordering between loads and stores, I don't know, as I said,
> I don't fully know the x86 memory model.

You don't need to understand it. Assume memory-barriers.h is correct.

> In any ca

Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions

2012-05-21 Thread Anthony Liguori

On 05/21/2012 03:31 AM, Michael S. Tsirkin wrote:

More than that. smp_mb is pretty expensive. You
often can do just smp_wmb and smp_rmb and that is
very cheap.
Many operations run in the vcpu context
or start when guest exits to host and work
is bounced from there and thus no barrier is needed
here.

Example? start_xmit in e1000. Executed in vcpu context
so no barrier is needed.

virtio of course is another example since it does its own
barriers. But even without that, virtio_blk_handle_output
runs in vcpu context.

But more importantly, this hack just sweeps the
dirt under the carpet. Understanding the interaction
with guest drivers is important anyway. So


But this isn't what this series is about.

This series is only attempting to make sure that writes are ordered with respect 
to other writes in main memory.


It's based on the assumption that write ordering is well defined (and typically 
strict) on most busses including PCI.  I have not confirmed this myself but I 
trust that Ben has.


So the only problem trying to be solved here is to make sure that if a write A 
is issued by the device model while it's on PCPU 0, if PCPU 1 does a write B to 
another location, and then the device model runs on PCPU 2 and does a read of 
both A and B, it will only see the new value of B if the it sees the new value of A.


Whether the driver on VCPU 0 (which may be on any PCPU) also sees the write 
ordering is irrelevant.


If you want to avoid taking a barrier on every write, we can make use of map() 
and issue explicit barriers (as virtio does).


Regards,

Anthony Liguori


I really don't see why don't we audit devices
and add proper barriers.






Re: [Qemu-devel] Get current env within io_handler ?

2012-05-21 Thread Edgar E. Iglesias
On Mon, May 21, 2012 at 06:40:38PM +, Blue Swirl wrote:
> On Mon, May 21, 2012 at 6:28 PM, Peter Maydell  
> wrote:
> > On 21 May 2012 19:08, Blue Swirl  wrote:
> >> On Mon, May 21, 2012 at 10:36 AM, Peter Maydell
> >>  wrote:
> >>> I think it would be nice to have this in upstream qemu; however adding
> >>> transaction properties to the IO interface would be quite tricky I
> >>> suspect...
> >>
> >> This is limited to CPU and CPU local bus devices (not generic like
> >> PCI), so there could be an out of band mechanism to get/set the
> >> additional data.
> >
> > What's your definition of "generic" here? AMBA isn't particularly
> > limited to CPU local bus devices either, really (for instance
> > the connection between a versatile express CPU daughterboard
> > and the motherboard includes an AMBA AXI bus).
> 
> I'd expect that only AMBA devices (ARM specific) would care about the
> properties, while for example NE2k could not care less.
> 
> >
> >> For example store in op_helper.c could use
> >> cpu_set_amba_properties(...) before the store and afterwards
> >> cpu_get_amba_reply(...).
> >
> > We really don't want to do two helper function calls for every
> > load or store! If you're going to implement them at all then
> > you need a more efficient implementation than that...
> 
> How about lazy evaluation: generate the properties/evaluate reply only
> if the device wants them via
> device_get_properties()/device_set_reply(). Then the transaction
> overhead could be ignored by everyone if not needed.


Hi,

This party came up when evaluating the mem API. IMO, it would be nice
to have a way for the master (CPU, DMA or whatever) to be able to pass
attributes with accesses. Possibly also for the device to return
return codes (or error codes) as a result of the access.

The details may be bus specific (AMBA, OCP, etc) but the concept is
not. Unfortunately, I'm afraid it would involve quite a bit of changes to
the code again unless someone comes up with a smart way of doing it...

Cheers



Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function

2012-05-21 Thread Benjamin Herrenschmidt
On Mon, 2012-05-21 at 15:18 +0300, Michael S. Tsirkin wrote:

> > mb is more than just "read flush writes" (besides it's not a statement
> > about flushing, it's a statement about ordering. whether it has a
> > flushing side effect on x86 is a separate issue, it doesn't on power for
> > example).
> 
> I referred to reads not bypassing writes on PCI.

Again, from which originator ? From a given initiator, nothing bypasses
anything, so the right thing to do here is a full mb(). However, I
suspect what you are talking about here is read -responses- not
bypassing writes in the direction of the response (ie, the "flushing"
semantic of reads) which is a different matter. Also don't forget that
this is only a semantic of PCI, not of the system fabric, ie, a device
DMA read doesn't flush a CPU write that is still in that CPU store
queue.

> This is the real argument in my eyes: that we
> should behave the way real hardware does.

But that doesn't really make much sense since we don't actually have a
non-coherent bus sitting in the middle :-)

However we should as much as possible be observed to behave as such, I
agree, though I don't think we need to bother too much about timings
since we don't really have way to enforce the immediate visibility of
stores within the coherent domain without a bunch of arch specific very
very heavy hammers which we really don't want to wield at this point.

> > Real flushing out writes matters very much in real life in two very
> > different contexts that tend to not affect emulation in qemu as much.
> > 
> > One is flushing write in the opposite direction (or rather, having the
> > read response queued up behind those writes) which is critical to
> > ensuring proper completion of DMAs after an LSI from a guest driver
> > perspective on real HW typically.
> > 
> > The other classic case is to flush posted MMIO writes in order to ensure
> > that a subsequent delay is respected.
> > 
> > Most of those don't actually matter when doing emulation. Besides a
> > barrier won't provide you the second guarantee, you need a nastier
> > construct at least on some architectures like power.
> 
> Exactly. This is what I was saying too.

Right and I'm reasonably sure that none of those above is our problem. 

As I said, at this point, what I want to sort out is purely the
observable ordering of DMA transactions. The side effect of reads in one
direction on writes in the other direction is an orthogonal problem
which as I wrote above is probably not hurting us.

> > However, we do need to ensure that read and writes are properly ordered
> > vs. each other (regardless of any "flush" semantic) or things could go
> > very wrong on OO architectures (here too, my understanding on x86 is
> > limited).
> 
> Right. Here's a compromize:
> - add smp_rmb() on any DMA read
> - add smp_wmb( on any DMA write
> This is almost zero cost on x86 at least.
> So we are not regressing existing setups.

And it's not correct. With that setup, DMA writes can pass DMA reads
(and vice-versa) which doesn't correspond to the guarantees of the PCI
spec. The question I suppose is whether this is a problem in practice...

> Are there any places where devices do read after write?

It's possible, things like update of a descriptor followed by reading of
the next one, etc...  I don't have an example hot in mind right know of
a device that would be hurt but I'm a bit nervous as this would be a
violation of the PCI guaranteed ordering.

> My preferred way is to find them and do pci_dma_flush() invoking
> smp_mb(). If there is such a case it's likely on datapath anyway
> so we do care.
> 
> But I can also live with a global flag "latest_dma_read"
> and on read we could do
>   if (unlikely(latest_dma_read))
>   smp_mb();
> 
> if you really insist on it
> though I do think it's inelegant.

Again, why do you object on simply making the default accessors fully
ordered ? Do you think it will be a measurable different in most cases ?

Shouldn't we measure it first ?

> You said above x86 is unaffected. This is portability, not safety.

x86 is unaffected by the missing wmb/rmb, it might not be unaffected by
the missing ordering between loads and stores, I don't know, as I said,
I don't fully know the x86 memory model.

In any case, opposing "portability" to "safety" the way you do it means
you are making assumptions that basically "qemu is written for x86 and
nothing else matters".

If that's your point of view, so be it and be clear about it, but I will
disagree :-) And while I can understand that powerpc might not be
considered as the most important arch around at this point in time,
these problems are going to affect ARM as well.

> > Almost all our
> > devices were written without any thought given to ordering, so they
> > basically can and should be considered as all broken.
> 
> Problem is, a lot of code is likely broken even after you sprinkle
> barriers around. For example qemu might write A then B where guest driver

Re: [Qemu-devel] [RFC PATCH 3/4] block: Enable QEMU to retrieve passed fd before attempting open

2012-05-21 Thread Eric Blake
On 05/21/2012 02:19 PM, Corey Bryant wrote:
> With this patch, when QEMU needs to "open" a file, it will first
> check to see if a matching filename/fd pair were passed via the
> -filefd command line option or the getfd_file monitor command.
> If a match is found, QEMU will use the passed fd and will not
> attempt to open the file.  Otherwise, if a match is not found,
> QEMU will attempt to open the file on it's own.
> 
> Signed-off-by: Corey Bryant 

> +int file_open(const char *filename, int flags, mode_t mode)
> +{
> +int fd;
> +
> +#ifdef _WIN32
> +return qemu_open(filename, flags, mode);
> +#else

Would it be any easier to write:

#ifndef _WIN32
  qemu_get_fd_file() stuff
#endif
  return qemu_open()

so that you aren't repeating the return line?


> +fd = qemu_get_fd_file(filename, false);
> +if (fd != -1) {
> +return dup(fd);

Why are you dup'ing the fd?  That just sounds like a way to leak fds.
Remember, the existing 'getfd' monitor command doesn't dup things - it
either gets consumed by a successful use of the named fd, or it remains
open on failure and the user can close it by calling 'closefd'.

Or, if you are intentionally allowing the user to reuse the fd for more
than one qemu open instance, you need to document this point.

What happens if qemu wants O_WRONLY or O_RDWR access, but the user
passed in an fd with only O_RDONLY access?

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH 2/4] qmp/hmp: Add getfd_file monitor command

2012-05-21 Thread Eric Blake
On 05/21/2012 02:19 PM, Corey Bryant wrote:
> This patch provides support for the getfd_file monitor command.
> This command will allow passing of a filename and its corresponding
> file descriptor to a guest via the monitor.  This command could be
> followed, for example, by a drive_add command to hot attach a disk
> drive.
> 
> Signed-off-by: Corey Bryant 

Is the only difference between 'getfd' and 'getfd_file' the fact that
'getfd' introduces an abstract namespace usable only by the fd:
protocol, while the 'getfd_file' introduces a name identical to the
absolute naming of the file system and usable by the file: protocol?
What happens if I pass 'getfd_file' a relative file name?  Must the
filename passed to 'getfd_file' be in canonical form, or may it contain
symlinks, .., and other non-canonical constructs?

Can the 'closefd' command be used to close the fd originally given to
qemu via 'getfd_file'?

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH 1/4] qemu-options: Add -filefd command line option

2012-05-21 Thread Eric Blake
On 05/21/2012 02:19 PM, Corey Bryant wrote:
> This patch provides support for the -filefd command line option.
> This option will allow passing of a filename and its corresponding
> file descriptor to QEMU at exec time.
> 
> Signed-off-by: Corey Bryant 

> +DEF("filefd", HAS_ARG, QEMU_OPTION_filefd,
> +"-filefd file=,fd=\n"

I take it that if filename contains ',', then we have to escape it on
the command line?  Is it worth passing fd first and file second by
default, as a possible way to avoid the need for escaping, or does the
option parser not care about ordering?

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 1/2] pci: Add pci_device_get_host_irq

2012-05-21 Thread Michael S. Tsirkin
On Mon, May 21, 2012 at 03:58:57PM -0300, Jan Kiszka wrote:
> On 2012-05-21 14:34, Michael S. Tsirkin wrote:
> > On Mon, May 21, 2012 at 10:13:47AM -0300, Jan Kiszka wrote:
> >> Add a PCI IRQ path discovery function that walks from a given device to
> >> the host bridge, returning the IRQ number that is reported to the
> >> attached interrupt controller. For this purpose, another PCI bridge
> >> callback function is introduced: map_host_irq. It is so far only
> >> implemented by the PIIX3, other host bridges can be added later on as
> >> required.
> >>
> >> Will be used for KVM PCI device assignment.
> >>
> >> Signed-off-by: Jan Kiszka 
> > 
> > interrupt injection is data path even for emulated devices.
> > So instead of special casing device assignment I would like to see all
> > devices converted to an API that caches irqs.
> > 
> > This will likely mean that we can maintain the final
> > irq as part of the pci device structure, and
> > this api will simply return it.
> 
> Yep, I definitely agree. It's just that such a design has to please even
> more users than PCI devices, thus will likely take longer to settle than
> the device assignment effort. Therefore I decided to rush forward with
> an intermediate approach first.
> 
> Jan

I think it's easy, will try to do it soon.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 1/2] pci: Add pci_device_get_host_irq

2012-05-21 Thread Michael S. Tsirkin
On Mon, May 21, 2012 at 05:35:34PM -0300, Jan Kiszka wrote:
> On 2012-05-21 16:05, Michael S. Tsirkin wrote:
> > On Mon, May 21, 2012 at 10:13:47AM -0300, Jan Kiszka wrote:
> >> @@ -386,6 +387,14 @@ static void piix3_set_irq(void *opaque, int pirq, int 
> >> level)
> >>  piix3_set_irq_level(piix3, pirq, level);
> >>  }
> >>  
> >> +static int piix3_map_host_irq(void *opaque, int pci_intx)
> >> +{
> >> +PIIX3State *piix3 = opaque;
> >> +int host_irq = piix3->dev.config[PIIX_PIRQC + pci_intx];
> >> +
> >> +return host_irq < PIIX_NUM_PIC_IRQS ? host_irq : -1;
> >> +}
> >> +
> >>  /* irq routing is changed. so rebuild bitmap */
> >>  static void piix3_update_irq_levels(PIIX3State *piix3)
> >>  {
> > 
> > 
> > So, instead of special API just for assignment,
> > I would like to see map_irq in piix being reworked
> > to take dev config into account.
> > I think piix is almost unique in this but need to check,
> 
> Maybe it is the only host bridge with routing that we have in QEMU right
> now, but it is surely not unique (e.g. all later Intel chipsets support
> this).

Yes. APIs for this should be in place. Just saying
instead of failing we can easily just make it work
if there are no remappings.

> > if not fix other host buses that are programmable.
> > PCI bridges are all fixed routing.
> > 
> > Then we can drop set_irq callback.
> 
> set_irq may do more than IRQ routing. It may also flip polarity (see
> bonito.c).

In that case host_map_irq is not a good API?
Need to investigate.

> I guess we need to analyze the requirements of all in-tree
> host bridges first.

Yes.

> > 
> > Finally all devices can cache the irq#,
> > and piix would scan devices behind it
> > and update the irq.
> > 
> > Assignment then just needs a notifier, since
> > it owns the device just a pointer in device is
> > enough.
> 
> I cannot completely follow your ideas here yet. Do you want to pass the
> mapping information along the notifier, or why "just ... a notifier"?


Each device would resolve IRQs that it uses.


> > 
> > Could you look at doing this please?
> > If no I can give it a stub.
> > 
> 
> Unless we can converge over a final API quickly, I would suggest to base
> these refactorings on top of what I propose here. We will have to touch
> all host bridges more invasively for this anyway. If you can explain to
> me how simple the refactoring can be (but I'm a bit skeptical ;) ), I
> could do this, otherwise I would prefer to focus on the device
> assignment topic which provide some more nuts.
> 
> Jan

Yes it's simple. I'll post in a couple of days (lots of
meetings tomorrow). I think you'll
see it's easier and cleaner.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-21 Thread Anthony Liguori

On 05/21/2012 02:39 PM, Amit Shah wrote:

On (Wed) 16 May 2012 [13:23:11], Anthony Liguori wrote:

On 05/16/2012 12:21 PM, Amit Shah wrote:

On (Wed) 16 May 2012 [08:24:22], Anthony Liguori wrote:

On 05/16/2012 06:30 AM, Amit Shah wrote:

The Linux kernel already has a virtio-rng driver, this is the device
implementation.

When Linux needs more entropy, it puts a buffer in the vq.  We then put
entropy into that buffer, and push it back to the guest.

Feeding randomness from host's /dev/urandom into the guest is
sufficient, so this is a simple implementation that opens /dev/urandom
and reads from it whenever required.

Invocation is simple:

   qemu ... -device virtio-rng-pci

In the guest, we see

   $ cat /sys/devices/virtual/misc/hw_random/rng_available
   virtio

   $ cat /sys/devices/virtual/misc/hw_random/rng_current
   virtio

There are ways to extend the device to be more generic and collect
entropy from other sources, but this is simple enough and works for now.

Signed-off-by: Amit Shah


It's not this simple unfortunately.

If you did this with libvirt, one guest could exhaust the available
entropy for the remaining guests.  This could be used as a mechanism
for one guest to attack another (reducing the available entropy for
key generation).

You need to rate limit the amount of entropy that a guest can obtain
to allow management tools to mitigate this attack.


Hm, rate-limiting is a good point.  However, we're using /dev/urandom
here, which is nonblocking, and will keep on providing data as long as
we keep reading.


But you're still exhausting the entropy pool (which is a global
resource). That's the problem.


I understand.  It's been shown, however, that /dev/urandom isn't
easily exhausted, and can be used as a reliable random source for
quite a few years without new seeding.  And even if a guest (or more)
is malicious, the guest doing such activities would itself continue to
generate some seed for the host's pool, strengthening /dev/urandom.

I don't know where to cite the data from, but I'll pass on that info
when I have a reference.


Okay, I need to some type of reference here.

Not implementing virtio-rng for all of these years wasn't a simple oversight. 
It was specifically out of fear of exhausting the entropy pool.


I'm pretty opposed to merging this without addressing entropy exhaustion because 
it's a subtle enough issue that I don't think most users would be capable of 
understanding the ramifications.


Regards,

Anthony Liguori



In the meantime, I'll add a rate-limiting option to the device, it
does seem like a good idea to implement nevertheless.

Amit





[Qemu-devel] [ANNOUNCE] libguestfs 1.18 released - tools for managing virtual machines and disk images

2012-05-21 Thread Richard W.M. Jones

I'm pleased to announce the latest stable release of libguestfs, a
library and a set of tools for reading, writing, managing, inspecting,
rescuing, resizing and aligning disk images, and offline and live
virtual machines.  There are many new features and bug fixes in this
release; see the release notes below.

You can get libguestfs 1.18.0 from:

Main website:  http://libguestfs.org
Source:http://libguestfs.org/download/1.18-stable/
Fedora 17: https://admin.fedoraproject.org/updates/libguestfs-1.18.0-1.fc17
Debian/Ubuntu: [coming soon]

Rich.


Release notes for libguestfs 1.18.0
---

These release notes only cover the differences from the previous
stable/dev branch split (1.16.0).  For detailed changelogs, please see
the git repository, or the ChangeLog file distributed in the tarball.

New features

  virt tools:

   - virt-sysprep has been rewritten and expanded (thanks Wanlong Gao)
 http://libguestfs.org/virt-sysprep.1.html

   - virt-sparsify --zero is a new option that zeroes the named
 partition or filesystem

   - virt-sparsify can now safely sparsify Linux swap partitions

   - virt-sparsify fixed so it cleans up after ^C
 http://libguestfs.org/virt-sparsify.1.html

   - a new tool 'libguestfs-make-fixed-appliance' is provided to build
 fixed appliances that can be copied to other machines that don't
 have febootstrap support
 http://libguestfs.org/libguestfs-make-fixed-appliance.1.html

   - virt-filesystems now displays the parents (containers) of MD
 devices and volume groups

   - virt-alignment-scan, run with no args, displays alignment information
 for all libvirt domains

   - virt-df and virt-alignment-scan will display information from all
 guests even when a disk is inaccessible

   - virt-rescue new --scratch option to make scratch disks
 https://rwmj.wordpress.com/2012/04/26/virt-rescue-scratch/#content

   - virt-make-fs can now be used to create btrfs

   - virt-edit preserves permissions, UID, GID and SELinux context
 when editing files

   - guestfish passes the close event over stdout and remote correctly

   - guestfish new '--pipe-error' option lets you detect errors in pipe
 commands

   - guestfish globs now expand device names

   - comma and colon characters in filenames now handled correctly by
 all virt tools

  inspection:

   - added support for Fedora 17+

   - added support for FreeDOS

   - added support for Buildroot and Cirros

   - inspection is now compatible with Windows guests that have been
 sysprepped (thanks Grant Williamson).

  API:

   - broad support for btrfs added, including adding multiple devices,
 fsck, snapshots (thanks Wanlong Gao)

   - the new 'mount-local' API brings FUSE support directly into the
 core libguestfs API
 
https://rwmj.wordpress.com/2012/05/14/tip-using-mount-local-api-from-c/#content

   - new man page: guestfs-performance(1), which contains performance
 tuning tips
 http://libguestfs.org/guestfs-performance.1.html

   - new man page: guestfs-faq(1), Frequently Asked Questions
 http://libguestfs.org/guestfs-faq.1.html

   - ENOTSUP (from guestfs_last_errno) is now returned for APIs that
 are not supported

  examples:

   - 'copy_over' example showing how to copy between two handles

   - 'display_icon' program displays the icon associated with a guest

   - 'mount_local.c' example shows how to use the mount-local API

Security

  (no security problems were found or fixed in this release)

New APIs

  btrfs-device-add: Add devices to a btrfs filesystem.
  btrfs-device-delete: Remove devices from a btrfs filesystem.
  btrfs-filesystem-sync: Sync a btrfs filesystem.
  btrfs-filesystem-balance: Balance a btrfs filesystem.
  btrfs-fsck: Check btrfs filesystem.
  btrfs-set-seeding: Enable or disable seeding.
  btrfs-subvolume-create: Create a btrfs snapshot.
  btrfs-subvolume-delete: Delete a btrfs snapshot.
  btrfs-subvolume-list: List btrfs snapshots and subvolumes.
  btrfs-subvolume-set-default: Set default btrfs subvolume.
  btrfs-subvolume-snapshot: Create a writable btrfs snapshot.
  get-e2attrs: List ext2 file attributes of a file.
  get-e2generation: Get ext2 file generation of a file.
  isoinfo, isoinfo-device: Get information from the header of ISO files.
  llz: List files with SELinux information.
  lvcreate-free: Create an LVM logical volume in % remaining free space.
  md-stat: Return underlying devices from an MD device.
  mkfs-brtfs: Make btrfs filesystem, with all tunables.
  mount-local, mount-local-run, umount-local: FUSE support in the API.
  ntfsclone-in, ntfsclone-out: Save, restore NTFS from backup.
  ntfsfix: Fix common errors and force Windows to check NTFS.
  set-e2attrs: Set or clear ext2 file attributes of a file.
  set-e2generation: Set ext2 file generation of a file.
  set-label: Unified interface for setting filesystem label.
  vgmeta: Get volume group metadata.
  wipefs: Wipe fi

Re: [Qemu-devel] [PATCH 1/2] pci: Add pci_device_get_host_irq

2012-05-21 Thread Jan Kiszka
On 2012-05-21 16:05, Michael S. Tsirkin wrote:
> On Mon, May 21, 2012 at 10:13:47AM -0300, Jan Kiszka wrote:
>> @@ -386,6 +387,14 @@ static void piix3_set_irq(void *opaque, int pirq, int 
>> level)
>>  piix3_set_irq_level(piix3, pirq, level);
>>  }
>>  
>> +static int piix3_map_host_irq(void *opaque, int pci_intx)
>> +{
>> +PIIX3State *piix3 = opaque;
>> +int host_irq = piix3->dev.config[PIIX_PIRQC + pci_intx];
>> +
>> +return host_irq < PIIX_NUM_PIC_IRQS ? host_irq : -1;
>> +}
>> +
>>  /* irq routing is changed. so rebuild bitmap */
>>  static void piix3_update_irq_levels(PIIX3State *piix3)
>>  {
> 
> 
> So, instead of special API just for assignment,
> I would like to see map_irq in piix being reworked
> to take dev config into account.
> I think piix is almost unique in this but need to check,

Maybe it is the only host bridge with routing that we have in QEMU right
now, but it is surely not unique (e.g. all later Intel chipsets support
this).

> if not fix other host buses that are programmable.
> PCI bridges are all fixed routing.
> 
> Then we can drop set_irq callback.

set_irq may do more than IRQ routing. It may also flip polarity (see
bonito.c). I guess we need to analyze the requirements of all in-tree
host bridges first.

> 
> Finally all devices can cache the irq#,
> and piix would scan devices behind it
> and update the irq.
> 
> Assignment then just needs a notifier, since
> it owns the device just a pointer in device is
> enough.

I cannot completely follow your ideas here yet. Do you want to pass the
mapping information along the notifier, or why "just ... a notifier"?

> 
> Could you look at doing this please?
> If no I can give it a stub.
> 

Unless we can converge over a final API quickly, I would suggest to base
these refactorings on top of what I propose here. We will have to touch
all host bridges more invasively for this anyway. If you can explain to
me how simple the refactoring can be (but I'm a bit skeptical ;) ), I
could do this, otherwise I would prefer to focus on the device
assignment topic which provide some more nuts.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 2/9] tcg: declare __jit_debug_descriptor to be static

2012-05-21 Thread Peter Maydell
On 21 May 2012 21:10, Jim Meyering  wrote:
> Peter Maydell wrote:
>> On 21 May 2012 20:51, Jim Meyering  wrote:
>>> From: Jim Meyering 
>>>
>>>
>>> Signed-off-by: Jim Meyering 
>>> ---
>>>  tcg/tcg.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/tcg/tcg.c b/tcg/tcg.c
>>> index ab589c7..350fdad 100644
>>> --- a/tcg/tcg.c
>>> +++ b/tcg/tcg.c
>>> @@ -2293,7 +2293,7 @@ void __jit_debug_register_code(void)
>>>
>>>  /* Must statically initialize the version, because GDB may check
>>>    the version before we can set it.  */
>>> -struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };
>>> +static struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };
>>>
>>>  /* End GDB interface.  */
>>
>> Nak. This symbol is global so that gdb can find it by fishing around
>> in the executable's symbol table.
>
> Thanks for the quick feedback.
>
> How does the scope of the symbol affect whether gdb can find it?

If you mark it 'static' the compiler can throw it away or completely
rearrange it if it's feeling clever enough, I think.

Anyway, we're following a prescribed interface here:
http://sourceware.org/gdb/onlinedocs/gdb/Declarations.html

and I don't think we should deviate from it. As the comment says,
"THE FOLLOWING MUST MATCH GDB DOCS.".

> If declaring this variable "static" is not appropriate,
> then the comment saying that static initialization is desired
> should be changed.

The comment means "statically initialize this variable rather than
doing it dynamically in some function at startup".

-- PMM



[Qemu-devel] [PATCH 9/9] convert many more globals to static

2012-05-21 Thread Jim Meyering
From: Jim Meyering 

Minor exceptions:
* arm-dis: move now-detected-as-unused static variables into #if-0'd
block of code where they *are* used.
* microblaze: remove decls of now-detected-as-unused vars

Signed-off-by: Jim Meyering 
---
 arm-dis.c |  8 ++---
 cpus.c|  4 +--
 cris-dis.c|  2 +-
 hw/9pfs/virtio-9p-synth.c |  2 +-
 hw/ide/pci.c  |  2 +-
 hw/leon3.c|  2 +-
 hw/mips_fulong2e.c|  2 +-
 hw/s390-virtio-bus.c  |  2 +-
 hw/spapr_rtas.c   |  2 +-
 hw/xen_platform.c |  2 +-
 hw/xgmac.c|  2 +-
 m68k-dis.c| 79 ---
 memory.c  |  2 +-
 microblaze-dis.c  |  6 ++--
 ppc-dis.c | 26 
 sh4-dis.c |  2 +-
 target-cris/translate.c   |  2 +-
 target-i386/cpu.c |  4 +--
 target-i386/kvm.c |  2 +-
 tests/fdc-test.c  |  2 +-
 vl.c  | 12 ---
 21 files changed, 84 insertions(+), 83 deletions(-)

diff --git a/arm-dis.c b/arm-dis.c
index 6bc4d71..43f0253 100644
--- a/arm-dis.c
+++ b/arm-dis.c
@@ -1545,10 +1545,6 @@ enum map_type {
   MAP_DATA
 };

-enum map_type last_type;
-int last_mapping_sym = -1;
-bfd_vma last_mapping_addr = 0;
-
 /* Decode a bitfield of the form matching regexp (N(-N)?,)*N(-N)?.
Returns pointer to following character of the format string and
fills in *VALUEP and *WIDTHP with the extracted value and number of
@@ -3877,6 +3873,10 @@ print_insn_arm (bfd_vma pc, struct disassemble_info 
*info)
 #if 0
   bfd_boolean   found = false;

+  static enum map_type last_type;
+  static int last_mapping_sym = -1;
+  static bfd_vma last_mapping_addr;
+
   if (info->disassembler_options)
 {
   parse_disassembler_options (info->disassembler_options);
diff --git a/cpus.c b/cpus.c
index b182b3d..d5cd318 100644
--- a/cpus.c
+++ b/cpus.c
@@ -84,7 +84,7 @@ typedef struct TimersState {
 int64_t dummy;
 } TimersState;

-TimersState timers_state;
+static TimersState timers_state;

 /* Return the virtual CPU time, based on the instruction counter.  */
 int64_t cpu_get_icount(void)
@@ -611,7 +611,7 @@ static void qemu_tcg_init_cpu_signals(void)
 }
 #endif /* _WIN32 */

-QemuMutex qemu_global_mutex;
+static QemuMutex qemu_global_mutex;
 static QemuCond qemu_io_proceeded_cond;
 static bool iothread_requesting_mutex;

diff --git a/cris-dis.c b/cris-dis.c
index 1d174ba..89ae0b7 100644
--- a/cris-dis.c
+++ b/cris-dis.c
@@ -1215,7 +1215,7 @@ cris_cc_strings[] =
 };

 /* Different names and semantics for condition  (0xf).  */
-const struct cris_cond15 cris_cond15s[] =
+static const struct cris_cond15 cris_cond15s[] =
 {
   /* FIXME: In what version did condition "ext" disappear?  */
   {"ext", cris_ver_v0_3},
diff --git a/hw/9pfs/virtio-9p-synth.c b/hw/9pfs/virtio-9p-synth.c
index 92e0b09..7a15da2 100644
--- a/hw/9pfs/virtio-9p-synth.c
+++ b/hw/9pfs/virtio-9p-synth.c
@@ -21,7 +21,7 @@
 #include 

 /* Root node for synth file system */
-V9fsSynthNode v9fs_synth_root = {
+static V9fsSynthNode v9fs_synth_root = {
 .name = "/",
 .actual_attr = {
 .mode = 0555 | S_IFDIR,
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 88c0942..ac22d56 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -419,7 +419,7 @@ static const VMStateDescription vmstate_bmdma_current = {
 }
 };

-const VMStateDescription vmstate_bmdma_status = {
+static const VMStateDescription vmstate_bmdma_status = {
 .name ="ide bmdma/status",
 .version_id = 1,
 .minimum_version_id = 1,
diff --git a/hw/leon3.c b/hw/leon3.c
index 0a5ff16..4d9cd68 100644
--- a/hw/leon3.c
+++ b/hw/leon3.c
@@ -208,7 +208,7 @@ static void leon3_generic_hw_init(ram_addr_t  ram_size,
 }
 }

-QEMUMachine leon3_generic_machine = {
+static QEMUMachine leon3_generic_machine = {
 .name = "leon3_generic",
 .desc = "Leon-3 generic",
 .init = leon3_generic_hw_init,
diff --git a/hw/mips_fulong2e.c b/hw/mips_fulong2e.c
index 1a8df10..b266141 100644
--- a/hw/mips_fulong2e.c
+++ b/hw/mips_fulong2e.c
@@ -389,7 +389,7 @@ static void mips_fulong2e_init(ram_addr_t ram_size, const 
char *boot_device,
 network_init();
 }

-QEMUMachine mips_fulong2e_machine = {
+static QEMUMachine mips_fulong2e_machine = {
 .name = "fulong2e",
 .desc = "Fulong 2e mini pc",
 .init = mips_fulong2e_init,
diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
index 63ccd5c..114f131 100644
--- a/hw/s390-virtio-bus.c
+++ b/hw/s390-virtio-bus.c
@@ -45,7 +45,7 @@

 #define VIRTIO_EXT_CODE   0x2603

-struct BusInfo s390_virtio_bus_info = {
+static struct BusInfo s390_virtio_bus_info = {
 .name   = "s390-virtio",
 .size   = sizeof(VirtIOS390Bus),
 };
diff --git a/hw/spapr_rtas.c b/hw/spapr_rtas.c
index ae18595..315d0fb 100644
--- a/hw/spapr_rtas.c
+++ b/hw/spapr_rtas.c
@@ -204,7 +204,7 @@ static struct rtas_call {
 spapr_rtas_fn fn;
 } 

[Qemu-devel] [RFC PATCH 0/4] block: file descriptor passing using -filefd and getfd_file

2012-05-21 Thread Corey Bryant
libvirt's sVirt security driver provides SELinux MAC isolation for
Qemu guest processes and their corresponding image files.  In other
words, sVirt uses SELinux to prevent a QEMU process from opening
files that do not belong to it.

sVirt provides this support by labeling guests and resources with
security labels that are stored in file system extended attributes.
Some file systems, such as NFS, do not support the extended
attribute security namespace, and therefore cannot support sVirt
isolation.

A solution to this problem is to provide fd passing support, where
libvirt opens files and passes file descriptors to QEMU.  This,
along with SELinux policy to prevent QEMU from opening files, can
provide image file isolation for NFS files.

This patch series adds the -filefd command-line option and the
getfd_file monitor command.  This will enable libvirt to open a
file and push the corresponding filename and file descriptor to
QEMU.  When QEMU needs to "open" a file, it will first check if the
file descriptor was passed by either of these methods before
attempting to actually open the file.

This series reuses the file_open function that Anthony Liguori
 created for the most recent fd passing
prototype.  It also reuses the test driver that Stefan
Hajnoczi  created for that prototype,
with several modifications.

Corey Bryant (4):
  qemu-options: Add -filefd command line option
  qmp/hmp: Add getfd_file monitor command
  block: Enable QEMU to retrieve passed fd before attempting open
  Example -filefd and getfd_file server

 block.c   |   31 +++
 block/raw-posix.c |   20 +++---
 block/raw-win32.c |4 +-
 block/vdi.c   |4 +-
 block/vmdk.c  |   21 ++---
 block/vpc.c   |2 +-
 block/vvfat.c |4 +-
 block_int.h   |   12 +++
 hmp-commands.hx   |   17 
 monitor.c |   70 +
 monitor.h |3 +
 qemu-config.c |   17 
 qemu-config.h |1 +
 qemu-options.hx   |   17 
 qemu-tool.c   |5 +
 qmp-commands.hx   |   30 +++
 test-fd-passing.c |  224 +
 vl.c  |6 ++
 18 files changed, 459 insertions(+), 29 deletions(-)
 create mode 100644 test-fd-passing.c

-- 
1.7.7.6




[Qemu-devel] [RFC PATCH 2/4] qmp/hmp: Add getfd_file monitor command

2012-05-21 Thread Corey Bryant
This patch provides support for the getfd_file monitor command.
This command will allow passing of a filename and its corresponding
file descriptor to a guest via the monitor.  This command could be
followed, for example, by a drive_add command to hot attach a disk
drive.

Signed-off-by: Corey Bryant 
---
 hmp-commands.hx |   17 +
 monitor.c   |   70 +++
 monitor.h   |3 ++
 qemu-tool.c |5 
 qmp-commands.hx |   30 +++
 5 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 18cb415..9cd5ed1 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1240,6 +1240,23 @@ used by another monitor command.
 ETEXI
 
 {
+.name   = "getfd_file",
+.args_type  = "filename:s",
+.params = "getfd_file filename",
+.help   = "receive a file descriptor via SCM rights and assign it 
a filename",
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_getfd_file,
+},
+
+STEXI
+@item getfd_file @var{filename}
+@findex getfd_file
+If a file descriptor is passed alongside this command using the SCM_RIGHTS
+mechanism on unix sockets, it is stored using the name @var{filename} for
+later use by other monitor commands.
+ETEXI
+
+{
 .name   = "block_passwd",
 .args_type  = "device:B,password:s",
 .params = "block_passwd device password",
diff --git a/monitor.c b/monitor.c
index 12a6fe2..bdf4dd8 100644
--- a/monitor.c
+++ b/monitor.c
@@ -163,6 +163,7 @@ struct Monitor {
 #endif
 QError *error;
 QLIST_HEAD(,mon_fd_t) fds;
+QLIST_HEAD(,mon_fd_t) file_fds;
 QLIST_ENTRY(Monitor) entry;
 };
 
@@ -2256,6 +2257,42 @@ static int do_closefd(Monitor *mon, const QDict *qdict, 
QObject **ret_data)
 return -1;
 }
 
+static int do_getfd_file(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+const char *filename = qdict_get_str(qdict, "filename");
+mon_fd_t *monfd;
+int fd;
+
+fd = qemu_chr_fe_get_msgfd(mon->chr);
+if (fd == -1) {
+qerror_report(QERR_FD_NOT_SUPPLIED);
+return -1;
+}
+
+if (qemu_isdigit(filename[0])) {
+qerror_report(QERR_INVALID_PARAMETER_VALUE, "filename",
+  "a name not starting with a digit");
+return -1;
+}
+
+QLIST_FOREACH(monfd, &mon->file_fds, next) {
+if (strcmp(monfd->name, filename) != 0) {
+continue;
+}
+
+close(monfd->fd);
+monfd->fd = fd;
+return 0;
+}
+
+monfd = g_malloc0(sizeof(mon_fd_t));
+monfd->name = g_strdup(filename);
+monfd->fd = fd;
+
+QLIST_INSERT_HEAD(&mon->file_fds, monfd, next);
+return 0;
+}
+
 static void do_loadvm(Monitor *mon, const QDict *qdict)
 {
 int saved_vm_running  = runstate_is_running();
@@ -2292,6 +2329,39 @@ int monitor_get_fd(Monitor *mon, const char *fdname)
 return -1;
 }
 
+int monitor_get_fd_file(Monitor *mon, const char *filename,
+bool take_ownership)
+{
+mon_fd_t *monfd;
+
+QLIST_FOREACH(monfd, &mon->file_fds, next) {
+int fd;
+
+if (strcmp(monfd->name, filename) != 0) {
+continue;
+}
+
+fd = monfd->fd;
+
+if (take_ownership) {
+/* caller takes ownership of fd */
+QLIST_REMOVE(monfd, next);
+g_free(monfd->name);
+g_free(monfd);
+}
+
+return fd;
+}
+
+return -1;
+}
+
+int qemu_get_fd_file(const char *filename, bool take_ownership)
+{
+return cur_mon ?
+   monitor_get_fd_file(cur_mon, filename, take_ownership) : -1;
+}
+
 /* mon_cmds and info_cmds would be sorted at runtime */
 static mon_cmd_t mon_cmds[] = {
 #include "hmp-commands.h"
diff --git a/monitor.h b/monitor.h
index 0d49800..529d8a7 100644
--- a/monitor.h
+++ b/monitor.h
@@ -60,6 +60,9 @@ int monitor_read_block_device_key(Monitor *mon, const char 
*device,
   void *opaque);
 
 int monitor_get_fd(Monitor *mon, const char *fdname);
+int monitor_get_fd_file(Monitor *mon, const char *filename,
+bool take_ownership);
+int qemu_get_fd_file(const char *filename, bool take_ownership);
 
 void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
 GCC_FMT_ATTR(2, 0);
diff --git a/qemu-tool.c b/qemu-tool.c
index 07fc4f2..d3d86bf 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -111,3 +111,8 @@ void migrate_add_blocker(Error *reason)
 void migrate_del_blocker(Error *reason)
 {
 }
+
+int qemu_get_fd_file(const char *fdname, bool take_ownership)
+{
+return -1;
+}
diff --git a/qmp-commands.hx b/qmp-commands.hx
index db980fa..1825a91 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -891,6 +891,36 @@ Example:
 EQMP
 
 {
+.name   = "getfd_file",
+.args_type  = "filename:s",
+.params = "getfd_file filename",
+.help 

[Qemu-devel] [RFC PATCH 3/4] block: Enable QEMU to retrieve passed fd before attempting open

2012-05-21 Thread Corey Bryant
With this patch, when QEMU needs to "open" a file, it will first
check to see if a matching filename/fd pair were passed via the
-filefd command line option or the getfd_file monitor command.
If a match is found, QEMU will use the passed fd and will not
attempt to open the file.  Otherwise, if a match is not found,
QEMU will attempt to open the file on it's own.

Signed-off-by: Corey Bryant 
---
 block.c   |   31 +++
 block/raw-posix.c |   20 ++--
 block/raw-win32.c |4 ++--
 block/vdi.c   |4 ++--
 block/vmdk.c  |   21 +
 block/vpc.c   |2 +-
 block/vvfat.c |4 ++--
 block_int.h   |   12 
 vl.c  |6 ++
 9 files changed, 75 insertions(+), 29 deletions(-)

diff --git a/block.c b/block.c
index af2ab4f..6472114 100644
--- a/block.c
+++ b/block.c
@@ -102,6 +102,37 @@ static BlockDriverState *bs_snapshots;
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
+static int filename_match(QemuOpts *opts, void *opaque)
+{
+const char *file = qemu_opt_get(opts, "file");
+int fd = qemu_opt_get_number(opts, "fd", -1);
+
+return (strcmp((char *)opaque, file) == 0) ? fd : 0;
+}
+
+int file_open(const char *filename, int flags, mode_t mode)
+{
+int fd;
+
+#ifdef _WIN32
+return qemu_open(filename, flags, mode);
+#else
+
+fd = qemu_opts_foreach(qemu_find_opts("filefd"), filename_match,
+   (void *)filename, 1);
+if (fd != 0) {
+return dup(fd);
+}
+
+fd = qemu_get_fd_file(filename, false);
+if (fd != -1) {
+return dup(fd);
+}
+
+return qemu_open(filename, flags, mode);
+#endif
+}
+
 #ifdef _WIN32
 static int is_windows_drive_prefix(const char *filename)
 {
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 03fcfcc..4f7b40f 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -208,7 +208,7 @@ static int raw_open_common(BlockDriverState *bs, const char 
*filename,
 s->open_flags |= O_DSYNC;
 
 s->fd = -1;
-fd = qemu_open(filename, s->open_flags, 0644);
+fd = file_open(filename, s->open_flags, 0644);
 if (fd < 0) {
 ret = -errno;
 if (ret == -EROFS)
@@ -568,8 +568,8 @@ static int raw_create(const char *filename, 
QEMUOptionParameter *options)
 options++;
 }
 
-fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
-  0644);
+fd = file_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
+   0644);
 if (fd < 0) {
 result = -errno;
 } else {
@@ -741,7 +741,7 @@ static int hdev_open(BlockDriverState *bs, const char 
*filename, int flags)
 if ( bsdPath[ 0 ] != '\0' ) {
 strcat(bsdPath,"s0");
 /* some CDs don't have a partition 0 */
-fd = open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE);
+fd = file_open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE, 0);
 if (fd < 0) {
 bsdPath[strlen(bsdPath)-1] = '1';
 } else {
@@ -798,7 +798,7 @@ static int fd_open(BlockDriverState *bs)
 #endif
 return -EIO;
 }
-s->fd = open(bs->filename, s->open_flags & ~O_NONBLOCK);
+s->fd = file_open(bs->filename, s->open_flags & ~O_NONBLOCK, 0);
 if (s->fd < 0) {
 s->fd_error_time = get_clock();
 s->fd_got_error = 1;
@@ -872,7 +872,7 @@ static int hdev_create(const char *filename, 
QEMUOptionParameter *options)
 options++;
 }
 
-fd = open(filename, O_WRONLY | O_BINARY);
+fd = file_open(filename, O_WRONLY | O_BINARY, 0);
 if (fd < 0)
 return -errno;
 
@@ -950,7 +950,7 @@ static int floppy_probe_device(const char *filename)
 if (strstart(filename, "/dev/fd", NULL))
 prio = 50;
 
-fd = open(filename, O_RDONLY | O_NONBLOCK);
+fd = file_open(filename, O_RDONLY | O_NONBLOCK, 0);
 if (fd < 0) {
 goto out;
 }
@@ -1003,7 +1003,7 @@ static void floppy_eject(BlockDriverState *bs, bool 
eject_flag)
 close(s->fd);
 s->fd = -1;
 }
-fd = open(bs->filename, s->open_flags | O_NONBLOCK);
+fd = file_open(bs->filename, s->open_flags | O_NONBLOCK, 0);
 if (fd >= 0) {
 if (ioctl(fd, FDEJECT, 0) < 0)
 perror("FDEJECT");
@@ -1053,7 +1053,7 @@ static int cdrom_probe_device(const char *filename)
 int prio = 0;
 struct stat st;
 
-fd = open(filename, O_RDONLY | O_NONBLOCK);
+fd = file_open(filename, O_RDONLY | O_NONBLOCK, 0);
 if (fd < 0) {
 goto out;
 }
@@ -1177,7 +1177,7 @@ static int cdrom_reopen(BlockDriverState *bs)
  */
 if (s->fd >= 0)
 close(s->fd);
-fd = open(bs->filename, s->open_flags, 0644);
+fd = file_open(bs->filename, s->open_flags, 0644);
 if (fd < 0) {
 s->fd = -1;
 return -EIO;
diff --git a/block/raw-win32.c b/block/raw-win32.c
index e4b0b75..

[Qemu-devel] [RFC PATCH 4/4] Example -filefd and getfd_file server

2012-05-21 Thread Corey Bryant
This example demonstrates use of the -filefd command to open two
disk drives at start-up time.  It also demonstrates hot attaching
a third disk drive with the getfd_file monitor command.  I still
have some learning to do with regards to QMP, so the example is
using a not-so-program-friendly HMP method.

Usage:
./test-fd-passing /path/hda.img /path/hdb.img /path/hdc.img

Signed-off-by: Corey Bryant 
---
 test-fd-passing.c |  224 +
 1 files changed, 224 insertions(+), 0 deletions(-)
 create mode 100644 test-fd-passing.c

diff --git a/test-fd-passing.c b/test-fd-passing.c
new file mode 100644
index 000..d568198
--- /dev/null
+++ b/test-fd-passing.c
@@ -0,0 +1,224 @@
+/*
+ * QEMU -filefd and getfd_file test server
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Stefan Hajnoczi   
+ *  Corey Bryant  
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ * gcc -Wall -o test-fd-passing test-fd-passing.c
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int openQemuMonitor(const char *monitor)
+{
+int i;
+int ret;
+struct sockaddr_un addr;
+int monfd = 0;
+
+if ((monfd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) {
+perror("socket");
+goto error;
+}
+
+memset(&addr, 0, sizeof(addr));
+addr.sun_family = AF_UNIX;
+strcpy(addr.sun_path, monitor);
+
+for (i = 0; i < 100; i++) {
+ret = connect(monfd, (struct sockaddr *) &addr, sizeof(addr));
+if (ret == 0) {
+break;
+}
+usleep(.2 * 100);
+}
+
+if (ret != 0) {
+fprintf(stderr, "no monitor socket");
+goto error;
+}
+
+return monfd;
+
+error:
+close(monfd);
+return -1;
+}
+
+static int issueHMPCmdFD(int monfd,const char *data, size_t len, int fd)
+{
+int ret;
+struct msghdr msg;
+struct iovec iov[1];
+char control[CMSG_SPACE(sizeof(int))];
+struct cmsghdr *cmsg;
+
+memset(&msg, 0, sizeof(msg));
+
+iov[0].iov_base = (void *)data;
+iov[0].iov_len = len;
+
+msg.msg_iov = iov;
+msg.msg_iovlen = 1;
+
+msg.msg_control = control;
+msg.msg_controllen = sizeof(control);
+
+cmsg = CMSG_FIRSTHDR(&msg);
+cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+cmsg->cmsg_level = SOL_SOCKET;
+cmsg->cmsg_type = SCM_RIGHTS;
+memcpy(CMSG_DATA(cmsg), &fd, sizeof(int));
+
+do {
+ret = sendmsg(monfd, &msg, 0);
+} while (ret < 0 && errno == EINTR);
+
+return ret;
+}
+
+int main(int argc, char *argv[]) {
+int rc;
+int fd1, fd2, hotfd, monfd=-1;
+int flags = O_RDWR;
+int mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
+pid_t child_pid;
+char *drive_str_1 = NULL;;
+char *drive_str_2 = NULL;;
+char *filefd_str_1 = NULL;
+char *filefd_str_2 = NULL;
+char *getfd_file_str = NULL;
+char *drive_add_str = NULL;
+char *device_add_str = NULL;
+
+if (argc != 4) {
+fprintf(stderr, "usage: %s   
\n", argv[0]);
+goto error;
+}
+
+fd1 = open(argv[1], flags, mode);
+if (fd1 == -1) {
+perror("open");
+goto error;
+}
+
+fd2 = open(argv[2], flags, mode);
+if (fd2 == -1) {
+perror("open");
+goto error;
+}
+
+hotfd = open(argv[3], flags, mode);
+if (hotfd == -1) {
+perror("open");
+goto error;
+}
+
+asprintf(&drive_str_1, "file=%s,if=none,id=drive-virtio-disk0", argv[1]);
+asprintf(&filefd_str_1, "file=%s,fd=%d", argv[1], fd1);
+asprintf(&drive_str_2, "file=%s,if=none,id=drive-virtio-disk1", argv[2]);
+asprintf(&filefd_str_2, "file=%s,fd=%d", argv[2], fd2);
+
+char *child_argv[] = {
+"qemu-system-x86_64",
+"-enable-kvm",
+"-m", "1024",
+"-chardev",
+
"socket,id=charmonitor,path=/var/lib/libvirt/qemu/RHEL62.monitor,server,nowait",
+"-mon",
+"chardev=charmonitor,id=monitor,mode=readline",
+"-drive", drive_str_1,
+"-device",
+
"virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0",
+"-drive", drive_str_2,
+"-device",
+
"virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1",
+"-filefd", filefd_str_1,
+"-filefd", filefd_str_2,
+"-vnc", ":0",
+NULL,
+};
+
+if (posix_spawn(&child_pid, "/usr/local/bin/qemu-system-x86_64",
+NULL, NULL, child_argv, environ) != 0) {
+perror("posix_spawn\n");
+goto error;
+}
+
+monfd = openQemuMonitor("/var/lib/libvirt/qemu/RHEL62.monitor");
+if (monfd == -1) {
+goto error;
+}
+
+asprintf(&getfd_file_str, "getfd_file %s\r\n", argv[3]);
+rc = issueHMPCmdFD(monfd, getfd_file_str,

[Qemu-devel] [RFC PATCH 1/4] qemu-options: Add -filefd command line option

2012-05-21 Thread Corey Bryant
This patch provides support for the -filefd command line option.
This option will allow passing of a filename and its corresponding
file descriptor to QEMU at exec time.

Signed-off-by: Corey Bryant 
---
 qemu-config.c   |   17 +
 qemu-config.h   |1 +
 qemu-options.hx |   17 +
 3 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/qemu-config.c b/qemu-config.c
index be84a03..64afac6 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -613,6 +613,22 @@ QemuOptsList qemu_boot_opts = {
 },
 };
 
+QemuOptsList qemu_filefd_opts = {
+.name = "filefd",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_filefd_opts.head),
+.desc = {
+{
+.name = "file",
+.type = QEMU_OPT_STRING,
+}, {
+.name = "fd",
+.type = QEMU_OPT_NUMBER,
+},
+{ /*End of list */ }
+},
+};
+
+
 static QemuOptsList *vm_config_groups[32] = {
 &qemu_drive_opts,
 &qemu_chardev_opts,
@@ -628,6 +644,7 @@ static QemuOptsList *vm_config_groups[32] = {
 &qemu_machine_opts,
 &qemu_boot_opts,
 &qemu_iscsi_opts,
+&qemu_filefd_opts,
 NULL,
 };
 
diff --git a/qemu-config.h b/qemu-config.h
index 6d7365d..6be54f1 100644
--- a/qemu-config.h
+++ b/qemu-config.h
@@ -4,6 +4,7 @@
 extern QemuOptsList qemu_fsdev_opts;
 extern QemuOptsList qemu_virtfs_opts;
 extern QemuOptsList qemu_spice_opts;
+extern QemuOptsList qemu_filefd_opts;
 
 QemuOptsList *qemu_find_opts(const char *group);
 void qemu_add_opts(QemuOptsList *list);
diff --git a/qemu-options.hx b/qemu-options.hx
index 8b66264..5f6782c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2743,6 +2743,23 @@ DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log,
 "-qtest-log LOG  specify tracing options\n",
 QEMU_ARCH_ALL)
 
+DEF("filefd", HAS_ARG, QEMU_OPTION_filefd,
+"-filefd file=,fd=\n"
+"passes a filename and its corresponding file 
descriptor\n",
+QEMU_ARCH_ALL)
+STEXI
+@item -filefd file=@var{filename},fd=@var{fd}
+@findex -filefd
+This is used when a management application opens a file on behalf of QEMU.
+Instead of performing an open, QEMU will use the fd passed on this option.
+@table @option
+@item file=@var{filename}
+The name of the file.
+@item fd=@var{fd}
+The file descriptor that corresponds to @var{filename}.
+@end table
+ETEXI
+
 HXCOMM This is the last statement. Insert new options before this line!
 STEXI
 @end table
-- 
1.7.7.6




[Qemu-devel] [PATCH 7/9] mips-dis: declare four globals to be "static"

2012-05-21 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 mips-dis.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/mips-dis.c b/mips-dis.c
index e3a6e0b..f6109a1 100644
--- a/mips-dis.c
+++ b/mips-dis.c
@@ -888,10 +888,9 @@ enum
Many instructions are short hand for other instructions (i.e., The
jal  instruction is short for jalr ).  */

-extern const struct mips_opcode mips_builtin_opcodes[];
-extern const int bfd_mips_num_builtin_opcodes;
-extern struct mips_opcode *mips_opcodes;
-extern int bfd_mips_num_opcodes;
+static const struct mips_opcode mips_builtin_opcodes[];
+static const int bfd_mips_num_builtin_opcodes;
+static int bfd_mips_num_opcodes;
 #define NUMOPCODES bfd_mips_num_opcodes

 
@@ -1203,7 +1202,7 @@ extern const int bfd_mips16_num_opcodes;
Many instructions are short hand for other instructions (i.e., The
jal  instruction is short for jalr ).  */

-const struct mips_opcode mips_builtin_opcodes[] =
+static const struct mips_opcode mips_builtin_opcodes[] =
 {
 /* These instructions appear first so that the disassembler will find
them first.  The assemblers uses a hash table based on the
@@ -2756,13 +2755,13 @@ const struct mips_opcode mips_builtin_opcodes[] =

 #define MIPS_NUM_OPCODES \
((sizeof mips_builtin_opcodes) / (sizeof (mips_builtin_opcodes[0])))
-const int bfd_mips_num_builtin_opcodes = MIPS_NUM_OPCODES;
+static const int bfd_mips_num_builtin_opcodes = MIPS_NUM_OPCODES;

 /* const removed from the following to allow for dynamic extensions to the
  * built-in instruction set. */
-struct mips_opcode *mips_opcodes =
+static struct mips_opcode *mips_opcodes =
   (struct mips_opcode *) mips_builtin_opcodes;
-int bfd_mips_num_opcodes = MIPS_NUM_OPCODES;
+static int bfd_mips_num_opcodes = MIPS_NUM_OPCODES;
 #undef MIPS_NUM_OPCODES

 /* Mips instructions are at maximum this many bytes long.  */
-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH v2 0/3] Event notifications for balloon driver

2012-05-21 Thread Daniel P. Berrange
This series is a followup to 2 previously posted patches

 * BALLOON_CHANGE QMP event:
   http://lists.nongnu.org/archive/html/qemu-devel/2012-05/msg02215.html

 * query-events QMP command: 
   http://lists.nongnu.org/archive/html/qemu-devel/2012-05/msg02255.html

Changes since v1:

 - Use a static array of strings for QMP event ID -> string conversion
 - Add BALLOON_CHANGE to qmp-events.txt

There is also a new patch in this series, which introduces the ability
todo simple rate limiting of stateless monitor events. With the ballooning
of a 1.8 GB guest, down to 0.9 GB this reduced the number of events
emitted from ~50 down to just 4, spread across a 4 second time window.
It is important to note that events are only discarded when they are
obsoleted by a newer event. So an application is guarenteed to see the
final balloon event, with at worst a 1 second delay.

This series is being tested with a corresponding patch to libvirt
for handling balloon events

https://www.redhat.com/archives/libvir-list/2012-May/msg00868.html




[Qemu-devel] [PATCH 16/16] qapi: convert netdev_del

2012-05-21 Thread Luiz Capitulino
Signed-off-by: Anthony Liguori 
Signed-off-by: Luiz Capitulino 
---
 hmp-commands.hx  |3 +--
 hmp.c|9 +
 hmp.h|1 +
 net.c|   11 +--
 net.h|1 -
 qapi-schema.json |   14 ++
 qmp-commands.hx  |5 +
 7 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index d0ce6a5..f5d9d91 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1051,8 +1051,7 @@ ETEXI
 .args_type  = "id:s",
 .params = "id",
 .help   = "remove host network device",
-.user_print = monitor_user_noop,
-.mhandler.cmd_new = do_netdev_del,
+.mhandler.cmd = hmp_netdev_del,
 },
 
 STEXI
diff --git a/hmp.c b/hmp.c
index 7a4e25f..2ce8cb9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -990,3 +990,12 @@ void hmp_netdev_add(Monitor *mon, const QDict *qdict)
 out:
 hmp_handle_error(mon, &err);
 }
+
+void hmp_netdev_del(Monitor *mon, const QDict *qdict)
+{
+const char *id = qdict_get_str(qdict, "id");
+Error *err = NULL;
+
+qmp_netdev_del(id, &err);
+hmp_handle_error(mon, &err);
+}
diff --git a/hmp.h b/hmp.h
index 017df87..79d138d 100644
--- a/hmp.h
+++ b/hmp.h
@@ -63,5 +63,6 @@ void hmp_migrate(Monitor *mon, const QDict *qdict);
 void hmp_device_del(Monitor *mon, const QDict *qdict);
 void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
 void hmp_netdev_add(Monitor *mon, const QDict *qdict);
+void hmp_netdev_del(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/net.c b/net.c
index 5f0c53c..4aa416c 100644
--- a/net.c
+++ b/net.c
@@ -1269,19 +1269,18 @@ exit_err:
 return -1;
 }
 
-int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
+void qmp_netdev_del(const char *id, Error **errp)
 {
-const char *id = qdict_get_str(qdict, "id");
 VLANClientState *vc;
 
 vc = qemu_find_netdev(id);
 if (!vc) {
-qerror_report(QERR_DEVICE_NOT_FOUND, id);
-return -1;
+error_set(errp, QERR_DEVICE_NOT_FOUND, id);
+return;
 }
+
 qemu_del_vlan_client(vc);
-qemu_opts_del(qemu_opts_find(qemu_find_opts("netdev"), id));
-return 0;
+qemu_opts_del(qemu_opts_find(qemu_find_opts_err("netdev", errp), id));
 }
 
 static void print_net_client(Monitor *mon, VLANClientState *vc)
diff --git a/net.h b/net.h
index 1eb9280..bdc2a06 100644
--- a/net.h
+++ b/net.h
@@ -172,7 +172,6 @@ void net_host_device_add(Monitor *mon, const QDict *qdict);
 void net_host_device_remove(Monitor *mon, const QDict *qdict);
 void netdev_add(QemuOpts *opts, Error **errp);
 int qmp_netdev_add(Monitor *mon, const QDict *qdict, QObject **ret);
-int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup"
 #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown"
diff --git a/qapi-schema.json b/qapi-schema.json
index 69fcd8e..bb1f806 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1826,3 +1826,17 @@
 { 'command': 'netdev_add',
   'data': {'type': 'str', 'id': 'str', '*props': '**'},
   'gen': 'no' }
+
+##
+# @netdev_del:
+#
+# Remove a network backend.
+#
+# @id: the name of the network backend to remove
+#
+# Returns: Nothing on success
+#  If @id is not a valid network backend, DeviceNotFound
+#
+# Since: 0.14.0
+##
+{ 'command': 'netdev_del', 'data': {'id': 'str'} }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index f6550cb..57ea803 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -671,10 +671,7 @@ EQMP
 {
 .name   = "netdev_del",
 .args_type  = "id:s",
-.params = "id",
-.help   = "remove host network device",
-.user_print = monitor_user_noop,
-.mhandler.cmd_new = do_netdev_del,
+.mhandler.cmd_new = qmp_marshal_input_netdev_del,
 },
 
 SQMP
-- 
1.7.9.2.384.g4a92a




[Qemu-devel] [PATCH 07/16] qemu-option: opt_set(): use error_set()

2012-05-21 Thread Luiz Capitulino
The functions qemu_opt_set() and opts_do_parse() both call opt_set(),
but their callers expect QError semantics. Thus, both functions call
qerro_report_err() to keep the expected semantics.

Signed-off-by: Luiz Capitulino 
---
 qemu-option.c |   31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/qemu-option.c b/qemu-option.c
index eee3b45..7af2b50 100644
--- a/qemu-option.c
+++ b/qemu-option.c
@@ -612,8 +612,8 @@ static void qemu_opt_del(QemuOpt *opt)
 g_free(opt);
 }
 
-static int opt_set(QemuOpts *opts, const char *name, const char *value,
-   bool prepend)
+static void opt_set(QemuOpts *opts, const char *name, const char *value,
+bool prepend, Error **errp)
 {
 QemuOpt *opt;
 const QemuOptDesc *desc = opts->list->desc;
@@ -629,8 +629,8 @@ static int opt_set(QemuOpts *opts, const char *name, const 
char *value,
 if (i == 0) {
 /* empty list -> allow any */;
 } else {
-qerror_report(QERR_INVALID_PARAMETER, name);
-return -1;
+error_set(errp, QERR_INVALID_PARAMETER, name);
+return;
 }
 }
 
@@ -650,18 +650,23 @@ static int opt_set(QemuOpts *opts, const char *name, 
const char *value,
 }
 qemu_opt_parse(opt, &local_err);
 if (error_is_set(&local_err)) {
-qerror_report_err(local_err);
-error_free(local_err);
+error_propagate(errp, local_err);
 qemu_opt_del(opt);
-return -1;
 }
-
-return 0;
 }
 
 int qemu_opt_set(QemuOpts *opts, const char *name, const char *value)
 {
-return opt_set(opts, name, value, false);
+Error *local_err = NULL;
+
+opt_set(opts, name, value, false, &local_err);
+if (error_is_set(&local_err)) {
+qerror_report_err(local_err);
+error_free(local_err);
+return -1;
+}
+
+return 0;
 }
 
 int qemu_opt_set_bool(QemuOpts *opts, const char *name, bool val)
@@ -847,6 +852,7 @@ static int opts_do_parse(QemuOpts *opts, const char *params,
 {
 char option[128], value[1024];
 const char *p,*pe,*pc;
+Error *local_err = NULL;
 
 for (p = params; *p != '\0'; p++) {
 pe = strchr(p, '=');
@@ -878,7 +884,10 @@ static int opts_do_parse(QemuOpts *opts, const char 
*params,
 }
 if (strcmp(option, "id") != 0) {
 /* store and parse */
-if (opt_set(opts, option, value, prepend) == -1) {
+opt_set(opts, option, value, prepend, &local_err);
+if (error_is_set(&local_err)) {
+qerror_report_err(local_err);
+error_free(local_err);
 return -1;
 }
 }
-- 
1.7.9.2.384.g4a92a




[Qemu-devel] [PATCH 10/16] qerror: introduce QERR_INVALID_OPTION_GROUP

2012-05-21 Thread Luiz Capitulino
Signed-off-by: Luiz Capitulino 
---
 qerror.c |4 
 qerror.h |3 +++
 2 files changed, 7 insertions(+)

diff --git a/qerror.c b/qerror.c
index 5092fe7..92c4eff 100644
--- a/qerror.c
+++ b/qerror.c
@@ -156,6 +156,10 @@ static const QErrorStringTable qerror_table[] = {
 .desc  = "Invalid block format '%(name)'",
 },
 {
+.error_fmt = QERR_INVALID_OPTION_GROUP,
+.desc  = "There is no option group '%(group)'",
+},
+{
 .error_fmt = QERR_INVALID_PARAMETER,
 .desc  = "Invalid parameter '%(name)'",
 },
diff --git a/qerror.h b/qerror.h
index 4cbba48..b4c8758 100644
--- a/qerror.h
+++ b/qerror.h
@@ -139,6 +139,9 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_INVALID_BLOCK_FORMAT \
 "{ 'class': 'InvalidBlockFormat', 'data': { 'name': %s } }"
 
+#define QERR_INVALID_OPTION_GROUP \
+"{ 'class': 'InvalidOptionGroup', 'data': { 'group': %s } }"
+
 #define QERR_INVALID_PARAMETER \
 "{ 'class': 'InvalidParameter', 'data': { 'name': %s } }"
 
-- 
1.7.9.2.384.g4a92a




[Qemu-devel] [PATCH qmp-next v5 00/16]: qapi: convert netdev_add & netdev_del

2012-05-21 Thread Luiz Capitulino
v5

- Simplify set_option_parameter() [Laszlo]
- Fix bad patch split in patch 15/16, hunk changing net_init_netdev()
  pertains to patch 14/16 [Laszlo]

 blockdev.c   |2 +-
 hmp-commands.hx  |6 +-
 hmp.c|   30 +
 hmp.h|2 +
 hw/pci-hotplug.c |8 ++-
 hw/qdev-monitor.c|7 +-
 hw/usb/dev-network.c |7 +-
 hw/usb/dev-storage.c |2 +-
 hw/watchdog.c|2 +-
 net.c|  104 --
 net.h|6 +-
 net/dump.c   |2 +-
 net/dump.h   |3 +-
 net/slirp.c  |5 +-
 net/slirp.h  |5 +-
 net/socket.c |8 +--
 net/socket.h |3 +-
 net/tap-win32.c  |2 +-
 net/tap.c|9 ++-
 net/tap.h|5 +-
 net/vde.c|2 +-
 net/vde.h|2 +-
 qapi-schema.json |   42 
 qemu-char.c  |8 ++-
 qemu-config.c|   43 ++---
 qemu-config.h|3 +
 qemu-option.c|  175 ++
 qemu-option.h|   11 +++-
 qemu-sockets.c   |8 +--
 qerror.c |4 ++
 qerror.h |3 +
 qmp-commands.hx  |   10 +--
 vl.c |   22 ---
 33 files changed, 380 insertions(+), 171 deletions(-)



Re: [Qemu-devel] [PATCH 2/9] tcg: declare __jit_debug_descriptor to be static

2012-05-21 Thread Jim Meyering
Peter Maydell wrote:
> On 21 May 2012 20:51, Jim Meyering  wrote:
>> From: Jim Meyering 
>>
>>
>> Signed-off-by: Jim Meyering 
>> ---
>>  tcg/tcg.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tcg/tcg.c b/tcg/tcg.c
>> index ab589c7..350fdad 100644
>> --- a/tcg/tcg.c
>> +++ b/tcg/tcg.c
>> @@ -2293,7 +2293,7 @@ void __jit_debug_register_code(void)
>>
>>  /* Must statically initialize the version, because GDB may check
>>    the version before we can set it.  */
>> -struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };
>> +static struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };
>>
>>  /* End GDB interface.  */
>
> Nak. This symbol is global so that gdb can find it by fishing around
> in the executable's symbol table.

Thanks for the quick feedback.

How does the scope of the symbol affect whether gdb can find it?
With it declared static, it seems to be no less visible than before:

$ gdb --eval 'p __jit_debug_descriptor' x86_64-softmmu/qemu-system-x86_64
Reading symbols from 
/h/j/w/co/qemu/x86_64-softmmu/qemu-system-x86_64...done.
$1 = {
  version = 1,
  action_flag = 0,
  relevant_entry = 0x0,
  first_entry = 0x0
}

If declaring this variable "static" is not appropriate,
then the comment saying that static initialization is desired
should be changed.



[Qemu-devel] [PATCH 10/19] msix: Invoke msix_handle_mask_update on msix_mask_all

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

In preparation of firing vector notifiers on mask changes, call
msix_handle_mask_update also from msix_mask_all. So far, this will have
no real effect.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 hw/msix.c |4 
 1 file changed, 4 insertions(+)

diff --git a/hw/msix.c b/hw/msix.c
index 3197465..e1a7d92 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -231,10 +231,14 @@ static void msix_mmio_setup(PCIDevice *d, MemoryRegion 
*bar)
 static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
 {
 int vector;
+
 for (vector = 0; vector < nentries; ++vector) {
 unsigned offset =
 vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
+bool was_masked = msix_is_masked(dev, vector);
+
 dev->msix_table_page[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
+msix_handle_mask_update(dev, vector, was_masked);
 }
 }
 
-- 
1.7.10.1




[Qemu-devel] [PATCH 11/16] qemu-config: find_list(): use error_set()

2012-05-21 Thread Luiz Capitulino
Note that qemu_find_opts() and qemu_config_parse() need to call
error_report() to maintain their semantics on error.

Signed-off-by: Luiz Capitulino 
---
 qemu-config.c |   32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/qemu-config.c b/qemu-config.c
index f876646..bdb381d 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -3,6 +3,7 @@
 #include "qemu-option.h"
 #include "qemu-config.h"
 #include "hw/qdev.h"
+#include "error.h"
 
 static QemuOptsList qemu_drive_opts = {
 .name = "drive",
@@ -631,7 +632,8 @@ static QemuOptsList *vm_config_groups[32] = {
 NULL,
 };
 
-static QemuOptsList *find_list(QemuOptsList **lists, const char *group)
+static QemuOptsList *find_list(QemuOptsList **lists, const char *group,
+   Error **errp)
 {
 int i;
 
@@ -640,14 +642,23 @@ static QemuOptsList *find_list(QemuOptsList **lists, 
const char *group)
 break;
 }
 if (lists[i] == NULL) {
-error_report("there is no option group \"%s\"", group);
+error_set(errp, QERR_INVALID_OPTION_GROUP, group);
 }
 return lists[i];
 }
 
 QemuOptsList *qemu_find_opts(const char *group)
 {
-return find_list(vm_config_groups, group);
+QemuOptsList *ret;
+Error *local_err = NULL;
+
+ret = find_list(vm_config_groups, group, &local_err);
+if (error_is_set(&local_err)) {
+error_report("%s\n", error_get_pretty(local_err));
+error_free(local_err);
+}
+
+return ret;
 }
 
 void qemu_add_opts(QemuOptsList *list)
@@ -762,6 +773,7 @@ int qemu_config_parse(FILE *fp, QemuOptsList **lists, const 
char *fname)
 char line[1024], group[64], id[64], arg[64], value[1024];
 Location loc;
 QemuOptsList *list = NULL;
+Error *local_err = NULL;
 QemuOpts *opts = NULL;
 int res = -1, lno = 0;
 
@@ -778,17 +790,23 @@ int qemu_config_parse(FILE *fp, QemuOptsList **lists, 
const char *fname)
 }
 if (sscanf(line, "[%63s \"%63[^\"]\"]", group, id) == 2) {
 /* group with id */
-list = find_list(lists, group);
-if (list == NULL)
+list = find_list(lists, group, &local_err);
+if (error_is_set(&local_err)) {
+error_report("%s\n", error_get_pretty(local_err));
+error_free(local_err);
 goto out;
+}
 opts = qemu_opts_create(list, id, 1, NULL);
 continue;
 }
 if (sscanf(line, "[%63[^]]]", group) == 1) {
 /* group without id */
-list = find_list(lists, group);
-if (list == NULL)
+list = find_list(lists, group, &local_err);
+if (error_is_set(&local_err)) {
+error_report("%s\n", error_get_pretty(local_err));
+error_free(local_err);
 goto out;
+}
 opts = qemu_opts_create(list, NULL, 0, NULL);
 continue;
 }
-- 
1.7.9.2.384.g4a92a




[Qemu-devel] [PATCH 5/9] ccid: make backend_enum_table "static const" and adjust users

2012-05-21 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 hw/ccid-card-emulated.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ccid-card-emulated.c b/hw/ccid-card-emulated.c
index f4a6da4..440f050 100644
--- a/hw/ccid-card-emulated.c
+++ b/hw/ccid-card-emulated.c
@@ -462,14 +462,14 @@ typedef struct EnumTable {
 uint32_t value;
 } EnumTable;

-EnumTable backend_enum_table[] = {
+static const EnumTable backend_enum_table[] = {
 {BACKEND_NSS_EMULATED_NAME, BACKEND_NSS_EMULATED},
 {BACKEND_CERTIFICATES_NAME, BACKEND_CERTIFICATES},
 {NULL, 0},
 };

 static uint32_t parse_enumeration(char *str,
-EnumTable *table, uint32_t not_found_value)
+const EnumTable *table, uint32_t not_found_value)
 {
 uint32_t ret = not_found_value;

@@ -487,7 +487,7 @@ static int emulated_initfn(CCIDCardState *base)
 {
 EmulatedState *card = DO_UPCAST(EmulatedState, base, base);
 VCardEmulError ret;
-EnumTable *ptable;
+const EnumTable *ptable;

 QSIMPLEQ_INIT(&card->event_list);
 QSIMPLEQ_INIT(&card->guest_apdu_list);
-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH 13/19] kvm: Introduce kvm_irqchip_add_msi_route

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

Add a service that establishes a static route from a virtual IRQ line to
an MSI message. Will be used for IRQFD and device assignment. As we will
use this service outside of CONFIG_KVM protected code, stub it properly.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 kvm-all.c  |   31 +++
 kvm-stub.c |8 
 kvm.h  |   10 ++
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 0117837..7f906ca 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1080,6 +1080,32 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
 return kvm_irqchip_set_irq(s, route->kroute.gsi, 1);
 }
 
+int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
+{
+struct kvm_irq_routing_entry kroute;
+int virq;
+
+if (!kvm_irqchip_in_kernel()) {
+return -ENOSYS;
+}
+
+virq = kvm_irqchip_get_virq(s);
+if (virq < 0) {
+return virq;
+}
+
+kroute.gsi = virq;
+kroute.type = KVM_IRQ_ROUTING_MSI;
+kroute.flags = 0;
+kroute.u.msi.address_lo = (uint32_t)msg.address;
+kroute.u.msi.address_hi = msg.address >> 32;
+kroute.u.msi.data = msg.data;
+
+kvm_add_routing_entry(s, &kroute);
+
+return virq;
+}
+
 #else /* !KVM_CAP_IRQ_ROUTING */
 
 static void kvm_init_irq_routing(KVMState *s)
@@ -1090,6 +1116,11 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
 {
 abort();
 }
+
+int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
+{
+abort();
+}
 #endif /* !KVM_CAP_IRQ_ROUTING */
 
 static int kvm_irqchip_create(KVMState *s)
diff --git a/kvm-stub.c b/kvm-stub.c
index 47c573d..db3a7dc 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -12,10 +12,13 @@
 
 #include "qemu-common.h"
 #include "hw/hw.h"
+#include "hw/msi.h"
 #include "cpu.h"
 #include "gdbstub.h"
 #include "kvm.h"
 
+KVMState *kvm_state;
+
 int kvm_init_vcpu(CPUArchState *env)
 {
 return -ENOSYS;
@@ -128,3 +131,8 @@ int kvm_on_sigbus(int code, void *addr)
 {
 return 1;
 }
+
+int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
+{
+return -ENOSYS;
+}
diff --git a/kvm.h b/kvm.h
index 8b061bd..67df1f1 100644
--- a/kvm.h
+++ b/kvm.h
@@ -44,6 +44,10 @@ typedef struct KVMCapabilityInfo {
 #define KVM_CAP_INFO(CAP) { "KVM_CAP_" stringify(CAP), KVM_CAP_##CAP }
 #define KVM_CAP_LAST_INFO { NULL, 0 }
 
+struct KVMState;
+typedef struct KVMState KVMState;
+extern KVMState *kvm_state;
+
 /* external API */
 
 int kvm_init(void);
@@ -88,10 +92,6 @@ int kvm_on_sigbus(int code, void *addr);
 
 /* internal API */
 
-struct KVMState;
-typedef struct KVMState KVMState;
-extern KVMState *kvm_state;
-
 int kvm_ioctl(KVMState *s, int type, ...);
 
 int kvm_vm_ioctl(KVMState *s, int type, ...);
@@ -213,4 +213,6 @@ int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t 
val, bool assign,
uint32_t size);
 
 int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool 
assign);
+
+int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg);
 #endif
-- 
1.7.10.1




[Qemu-devel] [PATCH 13/16] net: purge the monitor object from all init functions

2012-05-21 Thread Luiz Capitulino
The only backend that really uses it is the socket one, which calls
monitor_get_fd(). But it can use 'cur_mon' instead.

Signed-off-by: Luiz Capitulino 
---
 hw/pci-hotplug.c |2 +-
 hw/usb/dev-network.c |2 +-
 net.c|   18 +++---
 net.h|2 +-
 net/dump.c   |2 +-
 net/dump.h   |3 +--
 net/slirp.c  |5 +
 net/slirp.h  |5 +
 net/socket.c |8 +++-
 net/socket.h |3 +--
 net/tap-win32.c  |2 +-
 net/tap.c|9 -
 net/tap.h|5 ++---
 net/vde.c|2 +-
 net/vde.h|2 +-
 15 files changed, 27 insertions(+), 43 deletions(-)

diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c
index c55d8b9..785eb3d 100644
--- a/hw/pci-hotplug.c
+++ b/hw/pci-hotplug.c
@@ -60,7 +60,7 @@ static PCIDevice *qemu_pci_hot_add_nic(Monitor *mon,
 
 qemu_opt_set(opts, "type", "nic");
 
-ret = net_client_init(mon, opts, 0);
+ret = net_client_init(opts, 0);
 if (ret < 0)
 return NULL;
 if (nd_table[ret].devaddr) {
diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index b238a09..4e26e64 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -1367,7 +1367,7 @@ static USBDevice *usb_net_init(USBBus *bus, const char 
*cmdline)
 qemu_opt_set(opts, "type", "nic");
 qemu_opt_set(opts, "model", "usb");
 
-idx = net_client_init(NULL, opts, 0);
+idx = net_client_init(opts, 0);
 if (idx == -1) {
 return NULL;
 }
diff --git a/net.c b/net.c
index 246209f..adb7e20 100644
--- a/net.c
+++ b/net.c
@@ -745,10 +745,7 @@ int net_handle_fd_param(Monitor *mon, const char *param)
 return fd;
 }
 
-static int net_init_nic(QemuOpts *opts,
-Monitor *mon,
-const char *name,
-VLANState *vlan)
+static int net_init_nic(QemuOpts *opts, const char *name, VLANState *vlan)
 {
 int idx;
 NICInfo *nd;
@@ -821,7 +818,6 @@ static int net_init_nic(QemuOpts *opts,
  }
 
 typedef int (*net_client_init_func)(QemuOpts *opts,
-Monitor *mon,
 const char *name,
 VLANState *vlan);
 
@@ -1085,7 +1081,7 @@ static const struct {
 #endif /* CONFIG_NET_BRIDGE */
 };
 
-int net_client_init(Monitor *mon, QemuOpts *opts, int is_netdev)
+int net_client_init(QemuOpts *opts, int is_netdev)
 {
 const char *name;
 const char *type;
@@ -1156,7 +1152,7 @@ int net_client_init(Monitor *mon, QemuOpts *opts, int 
is_netdev)
 
 ret = 0;
 if (net_client_types[i].init) {
-ret = net_client_types[i].init(opts, mon, name, vlan);
+ret = net_client_types[i].init(opts, name, vlan);
 if (ret < 0) {
 /* TODO push error reporting into init() methods */
 qerror_report(QERR_DEVICE_INIT_FAILED, type);
@@ -1213,7 +1209,7 @@ void net_host_device_add(Monitor *mon, const QDict *qdict)
 
 qemu_opt_set(opts, "type", device);
 
-if (net_client_init(mon, opts, 0) < 0) {
+if (net_client_init(opts, 0) < 0) {
 monitor_printf(mon, "adding host network device %s failed\n", device);
 }
 }
@@ -1248,7 +1244,7 @@ int do_netdev_add(Monitor *mon, const QDict *qdict, 
QObject **ret_data)
 return -1;
 }
 
-res = net_client_init(mon, opts, 1);
+res = net_client_init(opts, 1);
 if (res < 0) {
 qemu_opts_del(opts);
 }
@@ -1431,14 +1427,14 @@ void net_check_clients(void)
 
 static int net_init_client(QemuOpts *opts, void *dummy)
 {
-if (net_client_init(NULL, opts, 0) < 0)
+if (net_client_init(opts, 0) < 0)
 return -1;
 return 0;
 }
 
 static int net_init_netdev(QemuOpts *opts, void *dummy)
 {
-return net_client_init(NULL, opts, 1);
+return net_client_init(opts, 1);
 }
 
 int net_init_clients(void)
diff --git a/net.h b/net.h
index 64993b4..9d1ed93 100644
--- a/net.h
+++ b/net.h
@@ -163,7 +163,7 @@ struct HCIInfo *qemu_next_hci(void);
 extern const char *legacy_tftp_prefix;
 extern const char *legacy_bootp_filename;
 
-int net_client_init(Monitor *mon, QemuOpts *opts, int is_netdev);
+int net_client_init(QemuOpts *opts, int is_netdev);
 int net_client_parse(QemuOptsList *opts_list, const char *str);
 int net_init_clients(void);
 void net_check_clients(void);
diff --git a/net/dump.c b/net/dump.c
index 4b48d48..f835c51 100644
--- a/net/dump.c
+++ b/net/dump.c
@@ -144,7 +144,7 @@ static int net_dump_init(VLANState *vlan, const char 
*device,
 return 0;
 }
 
-int net_init_dump(QemuOpts *opts, Monitor *mon, const char *name, VLANState 
*vlan)
+int net_init_dump(QemuOpts *opts, const char *name, VLANState *vlan)
 {
 int len;
 const char *file;
diff --git a/net/dump.h b/net/dump.h
index fdc91ad..2b5d9ba 100644
--- a/net/dump.h
+++ b/net/dump.h
@@ -27,7 +27,6 @@
 #include 

[Qemu-devel] [PATCH 18/19] msix: Add msix_nr_vectors_allocated

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

Analogously to msi_nr_vectors_allocated, add a service for MSI-X. Will
be used by the virtio-pci layer.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 hw/msix.c |5 +
 hw/msix.h |2 ++
 2 files changed, 7 insertions(+)

diff --git a/hw/msix.c b/hw/msix.c
index 1622e16..59c7a83 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -452,6 +452,11 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 msix_free_irq_entries(dev);
 }
 
+unsigned int msix_nr_vectors_allocated(const PCIDevice *dev)
+{
+return dev->msix_entries_nr;
+}
+
 static int msix_set_notifier_for_vector(PCIDevice *dev, unsigned int vector)
 {
 MSIMessage msg;
diff --git a/hw/msix.h b/hw/msix.h
index f33f18b..50aee82 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -13,6 +13,8 @@ void msix_write_config(PCIDevice *pci_dev, uint32_t address,
 
 int msix_uninit(PCIDevice *d, MemoryRegion *bar);
 
+unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
+
 void msix_save(PCIDevice *dev, QEMUFile *f);
 void msix_load(PCIDevice *dev, QEMUFile *f);
 
-- 
1.7.10.1




Re: [Qemu-devel] [PATCH 2/9] tcg: declare __jit_debug_descriptor to be static

2012-05-21 Thread Peter Maydell
On 21 May 2012 20:51, Jim Meyering  wrote:
> From: Jim Meyering 
>
>
> Signed-off-by: Jim Meyering 
> ---
>  tcg/tcg.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index ab589c7..350fdad 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -2293,7 +2293,7 @@ void __jit_debug_register_code(void)
>
>  /* Must statically initialize the version, because GDB may check
>    the version before we can set it.  */
> -struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };
> +static struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };
>
>  /* End GDB interface.  */

Nak. This symbol is global so that gdb can find it by fishing around
in the executable's symbol table.

-- PMM



[Qemu-devel] [PATCH 8/9] bonito: declare bonito_state to be static

2012-05-21 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 hw/bonito.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/bonito.c b/hw/bonito.c
index 77786f8..6bd0242 100644
--- a/hw/bonito.c
+++ b/hw/bonito.c
@@ -218,7 +218,7 @@ typedef struct PCIBonitoState

 } PCIBonitoState;

-PCIBonitoState * bonito_state;
+static PCIBonitoState *bonito_state;

 static void bonito_writel(void *opaque, target_phys_addr_t addr,
   uint64_t val, unsigned size)
-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH 4/9] linux-user: arg_table need not have global scope

2012-05-21 Thread Jim Meyering
From: Jim Meyering 

Declare arg_table to be "static const", and adjust the two users
to also be const.

Signed-off-by: Jim Meyering 
---
 linux-user/main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 191b750..cc5b54b 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -3133,7 +3133,7 @@ struct qemu_argument {
 const char *help;
 };

-struct qemu_argument arg_table[] = {
+static const struct qemu_argument arg_table[] = {
 {"h",  "", false, handle_arg_help,
  "",   "print this help"},
 {"g",  "QEMU_GDB", true,  handle_arg_gdb,
@@ -3175,7 +3175,7 @@ struct qemu_argument arg_table[] = {

 static void usage(void)
 {
-struct qemu_argument *arginfo;
+const struct qemu_argument *arginfo;
 int maxarglen;
 int maxenvlen;

@@ -3241,7 +3241,7 @@ static int parse_args(int argc, char **argv)
 {
 const char *r;
 int optind;
-struct qemu_argument *arginfo;
+const struct qemu_argument *arginfo;

 for (arginfo = arg_table; arginfo->handle_opt != NULL; arginfo++) {
 if (arginfo->env == NULL) {
-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH 6/9] sheepdog: declare bdrv_sheepdog to be static

2012-05-21 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 block/sheepdog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index e01d371..fdb3eca 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2014,7 +2014,7 @@ static QEMUOptionParameter sd_create_options[] = {
 { NULL }
 };

-BlockDriver bdrv_sheepdog = {
+static BlockDriver bdrv_sheepdog = {
 .format_name= "sheepdog",
 .protocol_name  = "sheepdog",
 .instance_size  = sizeof(BDRVSheepdogState),
-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH 03/19] kvm: Introduce basic MSI support for in-kernel irqchips

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

This patch basically adds kvm_irqchip_send_msi, a service for sending
arbitrary MSI messages to KVM's in-kernel irqchip models.

As the original KVM API requires us to establish a static route from a
pseudo GSI to the target MSI message and inject the MSI via toggling
that virtual IRQ, we need to play some tricks to make this interface
transparent. We create those routes on demand and keep them in a hash
table. Succeeding messages can then search for an existing route in the
table first and reuse it whenever possible. If we should run out of
limited GSIs, we simply flush the table and rebuild it as messages are
sent.

This approach is rather simple and could be optimized further. However,
latest kernels contains a more efficient MSI injection interface that
will obsolete the GSI-based dynamic injection.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 kvm-all.c |  139 -
 kvm.h |1 +
 2 files changed, 139 insertions(+), 1 deletion(-)

diff --git a/kvm-all.c b/kvm-all.c
index 2d82d54..ff0534b 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -24,6 +24,7 @@
 #include "qemu-barrier.h"
 #include "sysemu.h"
 #include "hw/hw.h"
+#include "hw/msi.h"
 #include "gdbstub.h"
 #include "kvm.h"
 #include "bswap.h"
@@ -48,6 +49,8 @@
 do { } while (0)
 #endif
 
+#define KVM_MSI_HASHTAB_SIZE256
+
 typedef struct KVMSlot
 {
 target_phys_addr_t start_addr;
@@ -59,6 +62,11 @@
 
 typedef struct kvm_dirty_log KVMDirtyLog;
 
+typedef struct KVMMSIRoute {
+struct kvm_irq_routing_entry kroute;
+QTAILQ_ENTRY(KVMMSIRoute) entry;
+} KVMMSIRoute;
+
 struct KVMState
 {
 KVMSlot slots[32];
@@ -87,6 +95,7 @@ struct KVMState
 int nr_allocated_irq_routes;
 uint32_t *used_gsi_bitmap;
 unsigned int gsi_count;
+QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE];
 #endif
 };
 
@@ -862,9 +871,14 @@ static void set_gsi(KVMState *s, unsigned int gsi)
 s->used_gsi_bitmap[gsi / 32] |= 1U << (gsi % 32);
 }
 
+static void clear_gsi(KVMState *s, unsigned int gsi)
+{
+s->used_gsi_bitmap[gsi / 32] &= ~(1U << (gsi % 32));
+}
+
 static void kvm_init_irq_routing(KVMState *s)
 {
-int gsi_count;
+int gsi_count, i;
 
 gsi_count = kvm_check_extension(s, KVM_CAP_IRQ_ROUTING);
 if (gsi_count > 0) {
@@ -884,6 +898,10 @@ static void kvm_init_irq_routing(KVMState *s)
 s->irq_routes = g_malloc0(sizeof(*s->irq_routes));
 s->nr_allocated_irq_routes = 0;
 
+for (i = 0; i < KVM_MSI_HASHTAB_SIZE; i++) {
+QTAILQ_INIT(&s->msi_hashtab[i]);
+}
+
 kvm_arch_init_irq_routing(s);
 }
 
@@ -934,11 +952,130 @@ int kvm_irqchip_commit_routes(KVMState *s)
 return kvm_vm_ioctl(s, KVM_SET_GSI_ROUTING, s->irq_routes);
 }
 
+static void kvm_irqchip_release_virq(KVMState *s, int virq)
+{
+struct kvm_irq_routing_entry *e;
+int i;
+
+for (i = 0; i < s->irq_routes->nr; i++) {
+e = &s->irq_routes->entries[i];
+if (e->gsi == virq) {
+s->irq_routes->nr--;
+*e = s->irq_routes->entries[s->irq_routes->nr];
+}
+}
+clear_gsi(s, virq);
+}
+
+static unsigned int kvm_hash_msi(uint32_t data)
+{
+/* This is optimized for IA32 MSI layout. However, no other arch shall
+ * repeat the mistake of not providing a direct MSI injection API. */
+return data & 0xff;
+}
+
+static void kvm_flush_dynamic_msi_routes(KVMState *s)
+{
+KVMMSIRoute *route, *next;
+unsigned int hash;
+
+for (hash = 0; hash < KVM_MSI_HASHTAB_SIZE; hash++) {
+QTAILQ_FOREACH_SAFE(route, &s->msi_hashtab[hash], entry, next) {
+kvm_irqchip_release_virq(s, route->kroute.gsi);
+QTAILQ_REMOVE(&s->msi_hashtab[hash], route, entry);
+g_free(route);
+}
+}
+}
+
+static int kvm_irqchip_get_virq(KVMState *s)
+{
+uint32_t *word = s->used_gsi_bitmap;
+int max_words = ALIGN(s->gsi_count, 32) / 32;
+int i, bit;
+bool retry = true;
+
+again:
+/* Return the lowest unused GSI in the bitmap */
+for (i = 0; i < max_words; i++) {
+bit = ffs(~word[i]);
+if (!bit) {
+continue;
+}
+
+return bit - 1 + i * 32;
+}
+if (retry) {
+retry = false;
+kvm_flush_dynamic_msi_routes(s);
+goto again;
+}
+return -ENOSPC;
+
+}
+
+static KVMMSIRoute *kvm_lookup_msi_route(KVMState *s, MSIMessage msg)
+{
+unsigned int hash = kvm_hash_msi(msg.data);
+KVMMSIRoute *route;
+
+QTAILQ_FOREACH(route, &s->msi_hashtab[hash], entry) {
+if (route->kroute.u.msi.address_lo == (uint32_t)msg.address &&
+route->kroute.u.msi.address_hi == (msg.address >> 32) &&
+route->kroute.u.msi.data == msg.data) {
+return route;
+}
+}
+return NULL;
+}
+
+int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
+{
+KVMMSIRoute *route;
+
+route = kvm_lookup_msi_route(s, msg);
+if (

[Qemu-devel] [PATCH 16/19] kvm: Introduce kvm_irqchip_add/remove_irqfd

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

Add services to associate an eventfd file descriptor as input with an
IRQ line as output. Such a line can be an input pin of an in-kernel
irqchip or a virtual line returned by kvm_irqchip_add_route.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 kvm-all.c  |   30 ++
 kvm-stub.c |   10 ++
 kvm.h  |3 +++
 3 files changed, 43 insertions(+)

diff --git a/kvm-all.c b/kvm-all.c
index e96f092..489ee53 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1108,6 +1108,21 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage 
msg)
 return virq;
 }
 
+static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign)
+{
+struct kvm_irqfd irqfd = {
+.fd = fd,
+.gsi = virq,
+.flags = assign ? 0 : KVM_IRQFD_FLAG_DEASSIGN,
+};
+
+if (!kvm_irqchip_in_kernel()) {
+return -ENOSYS;
+}
+
+return kvm_vm_ioctl(s, KVM_IRQFD, &irqfd);
+}
+
 #else /* !KVM_CAP_IRQ_ROUTING */
 
 static void kvm_init_irq_routing(KVMState *s)
@@ -1123,8 +1138,23 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage 
msg)
 {
 abort();
 }
+
+static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign)
+{
+abort();
+}
 #endif /* !KVM_CAP_IRQ_ROUTING */
 
+int kvm_irqchip_add_irqfd(KVMState *s, int fd, int virq)
+{
+return kvm_irqchip_assign_irqfd(s, fd, virq, true);
+}
+
+int kvm_irqchip_remove_irqfd(KVMState *s, int fd, int virq)
+{
+return kvm_irqchip_assign_irqfd(s, fd, virq, false);
+}
+
 static int kvm_irqchip_create(KVMState *s)
 {
 QemuOptsList *list = qemu_find_opts("machine");
diff --git a/kvm-stub.c b/kvm-stub.c
index ec351d9..b4cf03f 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -140,3 +140,13 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
 void kvm_irqchip_release_virq(KVMState *s, int virq)
 {
 }
+
+int kvm_irqchip_add_irqfd(KVMState *s, int fd, int virq)
+{
+return -ENOSYS;
+}
+
+int kvm_irqchip_remove_irqfd(KVMState *s, int fd, int virq)
+{
+return -ENOSYS;
+}
diff --git a/kvm.h b/kvm.h
index f0d0c53..9c7b0ea 100644
--- a/kvm.h
+++ b/kvm.h
@@ -215,4 +215,7 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, 
uint16_t val, bool assign);
 
 int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
+
+int kvm_irqchip_add_irqfd(KVMState *s, int fd, int virq);
+int kvm_irqchip_remove_irqfd(KVMState *s, int fd, int virq);
 #endif
-- 
1.7.10.1




[Qemu-devel] [PATCH 3/9] alpha-dis: remove unused global; declare others to be static

2012-05-21 Thread Jim Meyering
From: Jim Meyering 

alpha_num_operands: Remove both declarations of this unused global.
alpha_opcodes: Declare static to limit scope. Remove duplicate decl.
alpha_num_opcodes: Likewise.
alpha_operands: Likewise.

Signed-off-by: Jim Meyering 
---
 alpha-dis.c | 26 +-
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/alpha-dis.c b/alpha-dis.c
index ae331b3..f39b56d 100644
--- a/alpha-dis.c
+++ b/alpha-dis.c
@@ -53,12 +53,6 @@ struct alpha_opcode
   unsigned char operands[4];
 };

-/* The table itself is sorted by major opcode number, and is otherwise
-   in the order in which the disassembler should consider
-   instructions.  */
-extern const struct alpha_opcode alpha_opcodes[];
-extern const unsigned alpha_num_opcodes;
-
 /* Values defined for the flags field of a struct alpha_opcode.  */

 /* CPU Availability */
@@ -134,12 +128,6 @@ struct alpha_operand
   int (*extract) (unsigned instruction, int *invalid);
 };

-/* Elements in the table are retrieved by indexing with values from
-   the operands field of the alpha_opcodes table.  */
-
-extern const struct alpha_operand alpha_operands[];
-extern const unsigned alpha_num_operands;
-
 /* Values defined for the flags field of a struct alpha_operand.  */

 /* Mask for selecting the type for typecheck purposes */
@@ -292,8 +280,10 @@ static int extract_ev6hwjhint (unsigned, int *);

 
 /* The operands table  */
+/* Elements are retrieved by indexing with values from
+   the operands field of the alpha_opcodes table.  */

-const struct alpha_operand alpha_operands[] =
+static const struct alpha_operand alpha_operands[] =
 {
   /* The fields are bits, shift, insert, extract, flags */
   /* The zero index is used to indicate end-of-list */
@@ -424,8 +414,6 @@ const struct alpha_operand alpha_operands[] =
 insert_ev6hwjhint, extract_ev6hwjhint }
 };

-const unsigned alpha_num_operands = 
sizeof(alpha_operands)/sizeof(*alpha_operands);
-
 /* The RB field when it is the same as the RA field in the same insn.
This operand is marked fake.  The insertion function just copies
the RA field into the RB field, and the extraction function just
@@ -662,6 +650,9 @@ extract_ev6hwjhint(unsigned insn, int *invalid 
ATTRIBUTE_UNUSED)
 #define ARG_EV6HWMEM   { RA, EV6HWDISP, PRB }
 
 /* The opcode table.
+   The table itself is sorted by major opcode number, and is otherwise
+   in the order in which the disassembler should consider
+   instructions.

The format of the opcode table is:

@@ -706,7 +697,7 @@ extract_ev6hwjhint(unsigned insn, int *invalid 
ATTRIBUTE_UNUSED)
that were not assigned to a particular extension.
 */

-const struct alpha_opcode alpha_opcodes[] = {
+static const struct alpha_opcode alpha_opcodes[] = {
   { "halt",SPCD(0x00,0x), BASE, ARG_NONE },
   { "draina",  SPCD(0x00,0x0002), BASE, ARG_NONE },
   { "bpt", SPCD(0x00,0x0080), BASE, ARG_NONE },
@@ -1732,7 +1723,8 @@ const struct alpha_opcode alpha_opcodes[] = {
   { "bgt", BRA(0x3F), BASE, ARG_BRA },
 };

-const unsigned alpha_num_opcodes = 
sizeof(alpha_opcodes)/sizeof(*alpha_opcodes);
+static const unsigned alpha_num_opcodes =
+  sizeof(alpha_opcodes)/sizeof(*alpha_opcodes);

 /* OSF register names.  */

-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH 09/19] msix: Factor out msix_get_message

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

This helper will also be used by the upcoming config notifier.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 hw/msix.c |   19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 3835eaa..3197465 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -35,6 +35,15 @@
 #define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
 #define MSIX_MAX_ENTRIES 32
 
+static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
+{
+uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
+MSIMessage msg;
+
+msg.address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
+msg.data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
+return msg;
+}
 
 /* Add MSI-X capability to the config space for the device. */
 /* Given a bar and its size, add MSI-X table on top of it
@@ -352,9 +361,7 @@ uint32_t msix_bar_size(PCIDevice *dev)
 /* Send an MSI-X message */
 void msix_notify(PCIDevice *dev, unsigned vector)
 {
-uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
-uint64_t address;
-uint32_t data;
+MSIMessage msg;
 
 if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
 return;
@@ -363,9 +370,9 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 return;
 }
 
-address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
-data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
-stl_le_phys(address, data);
+msg = msix_get_message(dev, vector);
+
+stl_le_phys(msg.address, msg.data);
 }
 
 void msix_reset(PCIDevice *dev)
-- 
1.7.10.1




[Qemu-devel] [PATCH 3/9] alpha-dis: remove unused global; declare others to be static

2012-05-21 Thread Jim Meyering
From: Jim Meyering 

alpha_num_operands: Remove both declarations of this unused global.
alpha_opcodes: Declare static to limit scope. Remove duplicate decl.
alpha_num_opcodes: Likewise.
alpha_operands: Likewise.

Signed-off-by: Jim Meyering 
---
 alpha-dis.c | 26 +-
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/alpha-dis.c b/alpha-dis.c
index ae331b3..f39b56d 100644
--- a/alpha-dis.c
+++ b/alpha-dis.c
@@ -53,12 +53,6 @@ struct alpha_opcode
   unsigned char operands[4];
 };

-/* The table itself is sorted by major opcode number, and is otherwise
-   in the order in which the disassembler should consider
-   instructions.  */
-extern const struct alpha_opcode alpha_opcodes[];
-extern const unsigned alpha_num_opcodes;
-
 /* Values defined for the flags field of a struct alpha_opcode.  */

 /* CPU Availability */
@@ -134,12 +128,6 @@ struct alpha_operand
   int (*extract) (unsigned instruction, int *invalid);
 };

-/* Elements in the table are retrieved by indexing with values from
-   the operands field of the alpha_opcodes table.  */
-
-extern const struct alpha_operand alpha_operands[];
-extern const unsigned alpha_num_operands;
-
 /* Values defined for the flags field of a struct alpha_operand.  */

 /* Mask for selecting the type for typecheck purposes */
@@ -292,8 +280,10 @@ static int extract_ev6hwjhint (unsigned, int *);

 
 /* The operands table  */
+/* Elements are retrieved by indexing with values from
+   the operands field of the alpha_opcodes table.  */

-const struct alpha_operand alpha_operands[] =
+static const struct alpha_operand alpha_operands[] =
 {
   /* The fields are bits, shift, insert, extract, flags */
   /* The zero index is used to indicate end-of-list */
@@ -424,8 +414,6 @@ const struct alpha_operand alpha_operands[] =
 insert_ev6hwjhint, extract_ev6hwjhint }
 };

-const unsigned alpha_num_operands = 
sizeof(alpha_operands)/sizeof(*alpha_operands);
-
 /* The RB field when it is the same as the RA field in the same insn.
This operand is marked fake.  The insertion function just copies
the RA field into the RB field, and the extraction function just
@@ -662,6 +650,9 @@ extract_ev6hwjhint(unsigned insn, int *invalid 
ATTRIBUTE_UNUSED)
 #define ARG_EV6HWMEM   { RA, EV6HWDISP, PRB }
 
 /* The opcode table.
+   The table itself is sorted by major opcode number, and is otherwise
+   in the order in which the disassembler should consider
+   instructions.

The format of the opcode table is:

@@ -706,7 +697,7 @@ extract_ev6hwjhint(unsigned insn, int *invalid 
ATTRIBUTE_UNUSED)
that were not assigned to a particular extension.
 */

-const struct alpha_opcode alpha_opcodes[] = {
+static const struct alpha_opcode alpha_opcodes[] = {
   { "halt",SPCD(0x00,0x), BASE, ARG_NONE },
   { "draina",  SPCD(0x00,0x0002), BASE, ARG_NONE },
   { "bpt", SPCD(0x00,0x0080), BASE, ARG_NONE },
@@ -1732,7 +1723,8 @@ const struct alpha_opcode alpha_opcodes[] = {
   { "bgt", BRA(0x3F), BASE, ARG_BRA },
 };

-const unsigned alpha_num_opcodes = 
sizeof(alpha_opcodes)/sizeof(*alpha_opcodes);
+static const unsigned alpha_num_opcodes =
+  sizeof(alpha_opcodes)/sizeof(*alpha_opcodes);

 /* OSF register names.  */

-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH 2/9] tcg: declare __jit_debug_descriptor to be static

2012-05-21 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 tcg/tcg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index ab589c7..350fdad 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2293,7 +2293,7 @@ void __jit_debug_register_code(void)

 /* Must statically initialize the version, because GDB may check
the version before we can set it.  */
-struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };
+static struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };

 /* End GDB interface.  */

-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH 2/9] tcg: declare __jit_debug_descriptor to be static

2012-05-21 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 tcg/tcg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index ab589c7..350fdad 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2293,7 +2293,7 @@ void __jit_debug_register_code(void)

 /* Must statically initialize the version, because GDB may check
the version before we can set it.  */
-struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };
+static struct jit_descriptor __jit_debug_descriptor = { 1, 0, 0, 0 };

 /* End GDB interface.  */

-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH 1/9] ccid: declare DEFAULT_ATR table to be "static const"

2012-05-21 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 hw/ccid-card-passthru.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ccid-card-passthru.c b/hw/ccid-card-passthru.c
index bd6c777..1caaa45 100644
--- a/hw/ccid-card-passthru.c
+++ b/hw/ccid-card-passthru.c
@@ -27,7 +27,7 @@ do {\
 #define D_VERBOSE 4

 /* TODO: do we still need this? */
-uint8_t DEFAULT_ATR[] = {
+static const uint8_t DEFAULT_ATR[] = {
 /*
  * From some example somewhere
  * 0x3B, 0xB0, 0x18, 0x00, 0xD1, 0x81, 0x05, 0xB1, 0x40, 0x38, 0x1F, 0x03, 0x28
-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH 1/9] ccid: declare DEFAULT_ATR table to be "static const"

2012-05-21 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 hw/ccid-card-passthru.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ccid-card-passthru.c b/hw/ccid-card-passthru.c
index bd6c777..1caaa45 100644
--- a/hw/ccid-card-passthru.c
+++ b/hw/ccid-card-passthru.c
@@ -27,7 +27,7 @@ do {\
 #define D_VERBOSE 4

 /* TODO: do we still need this? */
-uint8_t DEFAULT_ATR[] = {
+static const uint8_t DEFAULT_ATR[] = {
 /*
  * From some example somewhere
  * 0x3B, 0xB0, 0x18, 0x00, 0xD1, 0x81, 0x05, 0xB1, 0x40, 0x38, 0x1F, 0x03, 0x28
-- 
1.7.10.2.552.gaa3bb87




[Qemu-devel] [PATCH 0/9] convert many more globals to "static"

2012-05-21 Thread Jim Meyering
From: Jim Meyering 

Following up on discussion here,

  http://marc.info/?t=13375948768&r=1&w=2

here are patches to limit the scope of the remaining global variables.
Most changes simply added a preceding "static".  However, in some cases,
I've made minor additional changes, e.g., to make a static
table "const" as well (which sometimes required making an iterator
pointer const, too), and to move or remove declarations of variables
that the compiler then was able to identify as unused.

Initially I put the changes to each file in a separate commit, but that
got old quickly, and I lumped all of the remaining changes into the
9th commit.  If that's a problem, let me know and I'll separate it.


Jim Meyering (9):
  ccid: declare DEFAULT_ATR table to be "static const"
  tcg: declare __jit_debug_descriptor to be static
  alpha-dis: remove unused global; declare others to be static
  linux-user: arg_table need not have global scope
  ccid: make backend_enum_table "static const" and adjust users
  sheepdog: declare bdrv_sheepdog to be static
  mips-dis: declare four globals to be "static"
  bonito: declare bonito_state to be static
  convert many more globals to static

 alpha-dis.c   | 26 ++--
 arm-dis.c |  8 ++---
 block/sheepdog.c  |  2 +-
 cpus.c|  4 +--
 cris-dis.c|  2 +-
 hw/9pfs/virtio-9p-synth.c |  2 +-
 hw/bonito.c   |  2 +-
 hw/ccid-card-emulated.c   |  6 ++--
 hw/ccid-card-passthru.c   |  2 +-
 hw/ide/pci.c  |  2 +-
 hw/leon3.c|  2 +-
 hw/mips_fulong2e.c|  2 +-
 hw/s390-virtio-bus.c  |  2 +-
 hw/spapr_rtas.c   |  2 +-
 hw/xen_platform.c |  2 +-
 hw/xgmac.c|  2 +-
 linux-user/main.c |  6 ++--
 m68k-dis.c| 79 ---
 memory.c  |  2 +-
 microblaze-dis.c  |  6 ++--
 mips-dis.c| 15 +
 ppc-dis.c | 26 
 sh4-dis.c |  2 +-
 target-cris/translate.c   |  2 +-
 target-i386/cpu.c |  4 +--
 target-i386/kvm.c |  2 +-
 tcg/tcg.c |  2 +-
 tests/fdc-test.c  |  2 +-
 vl.c  | 12 ---
 29 files changed, 110 insertions(+), 118 deletions(-)

--
1.7.10.2.552.gaa3bb87



Re: [Qemu-devel] [PATCH v2 2/3] Add event notification for guest balloon changes

2012-05-21 Thread Daniel P. Berrange
On Tue, May 22, 2012 at 01:14:59AM +0530, Amit Shah wrote:
> On (Mon) 21 May 2012 [17:59:52], Daniel P. Berrange wrote:
> > From: "Daniel P. Berrange" 
> > 
> > After setting a balloon target value, applications have to
> > continually poll 'query-balloon' to determine whether the
> > guest has reacted to this request. The virtio-balloon backend
> > knows exactly when the guest has reacted though, and thus it
> > is possible to emit a JSON event to tell the mgmt application
> > whenever the guest balloon changes.
> > 
> > This introduces a new 'qemu_balloon_change()' API which is
> 
> I prefer qemu_balloon_changed(), it is clearer that this is called
> after a balloon value change.  qemu_balloon_change() can be taken to
> mean the function is called as a response the the monitor 'balloon'
> command.

Happy to change this.

> > +BALLOON_CHANGE
> > +--
> 
> similarly, this can be BALLOON_CHANGED

For the sake of consistency with the existing RTC_CHANGE event, I prefer
the naming I already have.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



[Qemu-devel] [PATCH 14/19] kvm: Publicize kvm_irqchip_release_virq

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

This allows to drop routes created by kvm_irqchip_add_irq/msi_route
again.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 kvm-all.c  |2 +-
 kvm-stub.c |4 
 kvm.h  |1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/kvm-all.c b/kvm-all.c
index 7f906ca..ca6cec6 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -955,7 +955,7 @@ int kvm_irqchip_commit_routes(KVMState *s)
 return kvm_vm_ioctl(s, KVM_SET_GSI_ROUTING, s->irq_routes);
 }
 
-static void kvm_irqchip_release_virq(KVMState *s, int virq)
+void kvm_irqchip_release_virq(KVMState *s, int virq)
 {
 struct kvm_irq_routing_entry *e;
 int i;
diff --git a/kvm-stub.c b/kvm-stub.c
index db3a7dc..ec351d9 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -136,3 +136,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
 {
 return -ENOSYS;
 }
+
+void kvm_irqchip_release_virq(KVMState *s, int virq)
+{
+}
diff --git a/kvm.h b/kvm.h
index 67df1f1..1779e73 100644
--- a/kvm.h
+++ b/kvm.h
@@ -215,4 +215,5 @@ int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t 
val, bool assign,
 int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool 
assign);
 
 int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg);
+void kvm_irqchip_release_virq(KVMState *s, int virq);
 #endif
-- 
1.7.10.1




Re: [Qemu-devel] [PATCH v2 2/3] Add event notification for guest balloon changes

2012-05-21 Thread Amit Shah
On (Mon) 21 May 2012 [17:59:52], Daniel P. Berrange wrote:
> From: "Daniel P. Berrange" 
> 
> After setting a balloon target value, applications have to
> continually poll 'query-balloon' to determine whether the
> guest has reacted to this request. The virtio-balloon backend
> knows exactly when the guest has reacted though, and thus it
> is possible to emit a JSON event to tell the mgmt application
> whenever the guest balloon changes.
> 
> This introduces a new 'qemu_balloon_change()' API which is

I prefer qemu_balloon_changed(), it is clearer that this is called
after a balloon value change.  qemu_balloon_change() can be taken to
mean the function is called as a response the the monitor 'balloon'
command.

> +BALLOON_CHANGE
> +--

similarly, this can be BALLOON_CHANGED


Amit



[Qemu-devel] [PATCH 19/19] virtio/vhost: Add support for KVM in-kernel MSI injection

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

Make use of the new vector notifier to track changes of the MSI-X
configuration of virtio PCI devices. On enabling events, we establish
the required virtual IRQ to MSI-X message route and link the signaling
eventfd file descriptor to this vIRQ line. That way, vhost-generated
interrupts can be directly delivered to an in-kernel MSI-X consumer like
the x86 APIC.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 hw/virtio-pci.c |  126 +++
 hw/virtio-pci.h |6 +++
 2 files changed, 132 insertions(+)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 4a4413d..01f5b92 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -24,6 +24,7 @@
 #include "virtio-scsi.h"
 #include "pci.h"
 #include "qemu-error.h"
+#include "msi.h"
 #include "msix.h"
 #include "net.h"
 #include "loader.h"
@@ -539,6 +540,107 @@ static void virtio_pci_guest_notifier_read(void *opaque)
 }
 }
 
+static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
+unsigned int queue_no,
+unsigned int vector,
+MSIMessage msg)
+{
+VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no);
+VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
+int fd, ret;
+
+fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vq));
+
+if (irqfd->users == 0) {
+ret = kvm_irqchip_add_msi_route(kvm_state, msg);
+if (ret < 0) {
+return ret;
+}
+irqfd->virq = ret;
+}
+irqfd->users++;
+
+ret = kvm_irqchip_add_irqfd(kvm_state, fd, irqfd->virq);
+if (ret < 0) {
+if (--irqfd->users == 0) {
+kvm_irqchip_release_virq(kvm_state, irqfd->virq);
+}
+return ret;
+}
+
+qemu_set_fd_handler(fd, NULL, NULL, NULL);
+
+return 0;
+}
+
+static void kvm_virtio_pci_vq_vector_release(VirtIOPCIProxy *proxy,
+ unsigned int queue_no,
+ unsigned int vector)
+{
+VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no);
+VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
+int fd, ret;
+
+fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vq));
+
+ret = kvm_irqchip_remove_irqfd(kvm_state, fd, irqfd->virq);
+assert(ret == 0);
+
+if (--irqfd->users == 0) {
+kvm_irqchip_release_virq(kvm_state, irqfd->virq);
+}
+
+qemu_set_fd_handler(fd, virtio_pci_guest_notifier_read, NULL, vq);
+}
+
+static int kvm_virtio_pci_vector_use(PCIDevice *dev, unsigned vector,
+ MSIMessage msg)
+{
+VirtIOPCIProxy *proxy = container_of(dev, VirtIOPCIProxy, pci_dev);
+VirtIODevice *vdev = proxy->vdev;
+int ret, queue_no;
+
+for (queue_no = 0; queue_no < VIRTIO_PCI_QUEUE_MAX; queue_no++) {
+if (!virtio_queue_get_num(vdev, queue_no)) {
+break;
+}
+if (virtio_queue_vector(vdev, queue_no) != vector) {
+continue;
+}
+ret = kvm_virtio_pci_vq_vector_use(proxy, queue_no, vector, msg);
+if (ret < 0) {
+goto undo;
+}
+}
+return 0;
+
+undo:
+while (--queue_no >= 0) {
+if (virtio_queue_vector(vdev, queue_no) != vector) {
+continue;
+}
+kvm_virtio_pci_vq_vector_release(proxy, queue_no, vector);
+}
+return ret;
+}
+
+static void kvm_virtio_pci_vector_release(PCIDevice *dev, unsigned vector)
+{
+VirtIOPCIProxy *proxy = container_of(dev, VirtIOPCIProxy, pci_dev);
+VirtIODevice *vdev = proxy->vdev;
+int queue_no;
+
+for (queue_no = 0; queue_no < VIRTIO_PCI_QUEUE_MAX; queue_no++) {
+if (!virtio_queue_get_num(vdev, queue_no)) {
+break;
+}
+if (virtio_queue_vector(vdev, queue_no) != vector) {
+continue;
+}
+kvm_virtio_pci_vq_vector_release(proxy, queue_no, vector);
+}
+}
+
 static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign)
 {
 VirtIOPCIProxy *proxy = opaque;
@@ -555,6 +657,9 @@ static int virtio_pci_set_guest_notifier(void *opaque, int 
n, bool assign)
 } else {
 qemu_set_fd_handler(event_notifier_get_fd(notifier),
 NULL, NULL, NULL);
+/* Test and clear notifier before closing it,
+ * in case poll callback didn't have time to run. */
+virtio_pci_guest_notifier_read(vq);
 event_notifier_cleanup(notifier);
 }
 
@@ -573,6 +678,13 @@ static int virtio_pci_set_guest_notifiers(void *opaque, 
bool assign)
 VirtIODevice *vdev = proxy->vdev;
 int r, n;
 
+/* Must unset vector notifier while guest notifier is still assigned */
+if (kvm_irqchip_in_kernel() && !assign) {
+msix_unset_vector_notifiers(&proxy->pci_dev);
+g_free(proxy->vector_irqfd);
+proxy->vector_irqfd =

Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-21 Thread Amit Shah
On (Wed) 16 May 2012 [13:23:11], Anthony Liguori wrote:
> On 05/16/2012 12:21 PM, Amit Shah wrote:
> >On (Wed) 16 May 2012 [08:24:22], Anthony Liguori wrote:
> >>On 05/16/2012 06:30 AM, Amit Shah wrote:
> >>>The Linux kernel already has a virtio-rng driver, this is the device
> >>>implementation.
> >>>
> >>>When Linux needs more entropy, it puts a buffer in the vq.  We then put
> >>>entropy into that buffer, and push it back to the guest.
> >>>
> >>>Feeding randomness from host's /dev/urandom into the guest is
> >>>sufficient, so this is a simple implementation that opens /dev/urandom
> >>>and reads from it whenever required.
> >>>
> >>>Invocation is simple:
> >>>
> >>>   qemu ... -device virtio-rng-pci
> >>>
> >>>In the guest, we see
> >>>
> >>>   $ cat /sys/devices/virtual/misc/hw_random/rng_available
> >>>   virtio
> >>>
> >>>   $ cat /sys/devices/virtual/misc/hw_random/rng_current
> >>>   virtio
> >>>
> >>>There are ways to extend the device to be more generic and collect
> >>>entropy from other sources, but this is simple enough and works for now.
> >>>
> >>>Signed-off-by: Amit Shah
> >>
> >>It's not this simple unfortunately.
> >>
> >>If you did this with libvirt, one guest could exhaust the available
> >>entropy for the remaining guests.  This could be used as a mechanism
> >>for one guest to attack another (reducing the available entropy for
> >>key generation).
> >>
> >>You need to rate limit the amount of entropy that a guest can obtain
> >>to allow management tools to mitigate this attack.
> >
> >Hm, rate-limiting is a good point.  However, we're using /dev/urandom
> >here, which is nonblocking, and will keep on providing data as long as
> >we keep reading.
> 
> But you're still exhausting the entropy pool (which is a global
> resource). That's the problem.

I understand.  It's been shown, however, that /dev/urandom isn't
easily exhausted, and can be used as a reliable random source for
quite a few years without new seeding.  And even if a guest (or more)
is malicious, the guest doing such activities would itself continue to
generate some seed for the host's pool, strengthening /dev/urandom.

I don't know where to cite the data from, but I'll pass on that info
when I have a reference.

In the meantime, I'll add a rate-limiting option to the device, it
does seem like a good idea to implement nevertheless.

Amit



[Qemu-devel] [PATCH 06/16] qemu-option: qemu_opts_validate(): use error_set()

2012-05-21 Thread Luiz Capitulino
net_client_init() propagates the error up by calling qerror_report_err(),
because its users expect QError semantics.

Signed-off-by: Luiz Capitulino 
---
 net.c |6 +-
 qemu-option.c |   13 +
 qemu-option.h |2 +-
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/net.c b/net.c
index 1922d8a..f5d9cc7 100644
--- a/net.c
+++ b/net.c
@@ -1136,10 +1136,14 @@ int net_client_init(Monitor *mon, QemuOpts *opts, int 
is_netdev)
 for (i = 0; i < NET_CLIENT_TYPE_MAX; i++) {
 if (net_client_types[i].type != NULL &&
 !strcmp(net_client_types[i].type, type)) {
+Error *local_err = NULL;
 VLANState *vlan = NULL;
 int ret;
 
-if (qemu_opts_validate(opts, &net_client_types[i].desc[0]) == -1) {
+qemu_opts_validate(opts, &net_client_types[i].desc[0], &local_err);
+if (error_is_set(&local_err)) {
+qerror_report_err(local_err);
+error_free(local_err);
 return -1;
 }
 
diff --git a/qemu-option.c b/qemu-option.c
index 6d36970..eee3b45 100644
--- a/qemu-option.c
+++ b/qemu-option.c
@@ -1041,7 +1041,7 @@ QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict)
 /* Validate parsed opts against descriptions where no
  * descriptions were provided in the QemuOptsList.
  */
-int qemu_opts_validate(QemuOpts *opts, const QemuOptDesc *desc)
+void qemu_opts_validate(QemuOpts *opts, const QemuOptDesc *desc, Error **errp)
 {
 QemuOpt *opt;
 Error *local_err = NULL;
@@ -1057,21 +1057,18 @@ int qemu_opts_validate(QemuOpts *opts, const 
QemuOptDesc *desc)
 }
 }
 if (desc[i].name == NULL) {
-qerror_report(QERR_INVALID_PARAMETER, opt->name);
-return -1;
+error_set(errp, QERR_INVALID_PARAMETER, opt->name);
+return;
 }
 
 opt->desc = &desc[i];
 
 qemu_opt_parse(opt, &local_err);
 if (error_is_set(&local_err)) {
-qerror_report_err(local_err);
-error_free(local_err);
-return -1;
+error_propagate(errp, local_err);
+return;
 }
 }
-
-return 0;
 }
 
 int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void 
*opaque,
diff --git a/qemu-option.h b/qemu-option.h
index 4d5b3d3..e9fbbb5 100644
--- a/qemu-option.h
+++ b/qemu-option.h
@@ -125,7 +125,7 @@ int qemu_opts_set(QemuOptsList *list, const char *id,
   const char *name, const char *value);
 const char *qemu_opts_id(QemuOpts *opts);
 void qemu_opts_del(QemuOpts *opts);
-int qemu_opts_validate(QemuOpts *opts, const QemuOptDesc *desc);
+void qemu_opts_validate(QemuOpts *opts, const QemuOptDesc *desc, Error **errp);
 int qemu_opts_do_parse(QemuOpts *opts, const char *params, const char 
*firstname);
 QemuOpts *qemu_opts_parse(QemuOptsList *list, const char *params, int 
permit_abbrev);
 void qemu_opts_set_defaults(QemuOptsList *list, const char *params,
-- 
1.7.9.2.384.g4a92a




Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-21 Thread Amit Shah
On (Wed) 16 May 2012 [13:24:10], Anthony Liguori wrote:
> On 05/16/2012 12:26 PM, Amit Shah wrote:
> >On (Wed) 16 May 2012 [14:45:34], Daniel P. Berrange wrote:
> >>On Wed, May 16, 2012 at 08:24:22AM -0500, Anthony Liguori wrote:
> >>>On 05/16/2012 06:30 AM, Amit Shah wrote:
> The Linux kernel already has a virtio-rng driver, this is the device
> implementation.
> 
> When Linux needs more entropy, it puts a buffer in the vq.  We then put
> entropy into that buffer, and push it back to the guest.
> 
> Feeding randomness from host's /dev/urandom into the guest is
> sufficient, so this is a simple implementation that opens /dev/urandom
> and reads from it whenever required.
> 
> Invocation is simple:
> 
>    qemu ... -device virtio-rng-pci
> 
> In the guest, we see
> 
>    $ cat /sys/devices/virtual/misc/hw_random/rng_available
>    virtio
> 
>    $ cat /sys/devices/virtual/misc/hw_random/rng_current
>    virtio
> 
> There are ways to extend the device to be more generic and collect
> entropy from other sources, but this is simple enough and works for now.
> 
> Signed-off-by: Amit Shah
> >>>
> >>>It's not this simple unfortunately.
> >>>
> >>>If you did this with libvirt, one guest could exhaust the available
> >>>entropy for the remaining guests.  This could be used as a mechanism
> >>>for one guest to attack another (reducing the available entropy for
> >>>key generation).
> >>>
> >>>You need to rate limit the amount of entropy that a guest can obtain
> >>>to allow management tools to mitigate this attack.
> >>
> >>Ultimately I think you need to have a push mechanism, where an external
> >>process feeds entropy to QEMU, rather than a pull mechanism where QEMU
> >>grabs entropy itself.
> >
> >Yes, that's the goal.
> >
> >That was my first instinct as well.  However, we already have guests
> >which have the current virtio-rng driver that works only in pull
> >mode.  Also, Linux's hw-rng interface doesn't have a pull mechanism at
> >all -- it asks the h/w for more entropy when the OS is low on it.
> >
> >>I tend to think that virtio-rng should have a chardev backend associated
> >>with it. The driver should just read from this chardev to get its entropy.
> >
> >I even started with this approach.  Adapting the chardev layer to
> >actually read from a source of data, only when needed (for the
> >pull-based mechanism) quickly turned ugly.
> >
> >When we do get a push-based mechanism working, we'll then also have to
> >think of buffering the data from the daemon somewhere.  It's not going
> >to be ideal.
> 
> push == pull with flow control.  There's no need to implement a
> different guest interface.

Though, the current implementation can be easily extended -- if we do
go without a chardev for v1 for virtio-rng, with qemu directly opening
urandom and feeding entropy, libvirt just has to add a
'virtio-rng-pci' device.  When the ',chardev' property becomes
available, it can connect the chardev to itself or a daemon and start
feeding data.

> >>Either libvirtd, or better yet a separate virt-entropyd daemonm would
> >>connect to each guest&  feed the entropy into each guest according to
> >>some desired metrics.
> >
> >I'd prefer a separate daemon.  There already is egd, which we can use.
> >However, there are restrictions with certification (as always).
> >Adding new daemons to the mix increases complexity and the time it
> >takes for certification, so doing it in libvirt may end up to be the
> >preferred approach.
> 
> I don't think time to certify is a reasonable technical consideration.

Absolutely -- but if it's as easy to implement it in libvirt vs
another standalone program, the libvirt approach should be preferred
considering the above.

Amit



[Qemu-devel] [PATCH 06/19] kvm: Update kernel headers

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

Corresponding kvm.git hash: f2569053e0

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 linux-headers/linux/kvm.h |   38 ++
 1 file changed, 38 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index ee7bd9c..c4426ec 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -449,6 +449,30 @@ struct kvm_ppc_pvinfo {
__u8  pad[108];
 };
 
+/* for KVM_PPC_GET_SMMU_INFO */
+#define KVM_PPC_PAGE_SIZES_MAX_SZ  8
+
+struct kvm_ppc_one_page_size {
+   __u32 page_shift;   /* Page shift (or 0) */
+   __u32 pte_enc;  /* Encoding in the HPTE (>>12) */
+};
+
+struct kvm_ppc_one_seg_page_size {
+   __u32 page_shift;   /* Base page shift of segment (or 0) */
+   __u32 slb_enc;  /* SLB encoding for BookS */
+   struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
+};
+
+#define KVM_PPC_PAGE_SIZES_REAL0x0001
+#define KVM_PPC_1T_SEGMENTS0x0002
+
+struct kvm_ppc_smmu_info {
+   __u64 flags;
+   __u32 slb_size;
+   __u32 pad;
+   struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
+};
+
 #define KVMIO 0xAE
 
 /* machine type bits, to be used as argument to KVM_CREATE_VM */
@@ -590,6 +614,8 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_SYNC_REGS 74
 #define KVM_CAP_PCI_2_3 75
 #define KVM_CAP_KVMCLOCK_CTRL 76
+#define KVM_CAP_SIGNAL_MSI 77
+#define KVM_CAP_PPC_GET_SMMU_INFO 78
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -715,6 +741,14 @@ struct kvm_one_reg {
__u64 addr;
 };
 
+struct kvm_msi {
+   __u32 address_lo;
+   __u32 address_hi;
+   __u32 data;
+   __u32 flags;
+   __u8  pad[16];
+};
+
 /*
  * ioctls for VM fds
  */
@@ -789,6 +823,10 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_PCI_2_3 */
 #define KVM_ASSIGN_SET_INTX_MASK  _IOW(KVMIO,  0xa4, \
   struct kvm_assigned_pci_dev)
+/* Available with KVM_CAP_SIGNAL_MSI */
+#define KVM_SIGNAL_MSI_IOW(KVMIO,  0xa5, struct kvm_msi)
+/* Available with KVM_CAP_PPC_GET_SMMU_INFO */
+#define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
 
 /*
  * ioctls for vcpu fds
-- 
1.7.10.1




Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-21 Thread Amit Shah
On (Wed) 16 May 2012 [08:48:20], Anthony Liguori wrote:
> On 05/16/2012 08:45 AM, Daniel P. Berrange wrote:
> >On Wed, May 16, 2012 at 08:24:22AM -0500, Anthony Liguori wrote:
> >>On 05/16/2012 06:30 AM, Amit Shah wrote:
> >>>The Linux kernel already has a virtio-rng driver, this is the device
> >>>implementation.
> >>>
> >>>When Linux needs more entropy, it puts a buffer in the vq.  We then put
> >>>entropy into that buffer, and push it back to the guest.
> >>>
> >>>Feeding randomness from host's /dev/urandom into the guest is
> >>>sufficient, so this is a simple implementation that opens /dev/urandom
> >>>and reads from it whenever required.
> >>>
> >>>Invocation is simple:
> >>>
> >>>   qemu ... -device virtio-rng-pci
> >>>
> >>>In the guest, we see
> >>>
> >>>   $ cat /sys/devices/virtual/misc/hw_random/rng_available
> >>>   virtio
> >>>
> >>>   $ cat /sys/devices/virtual/misc/hw_random/rng_current
> >>>   virtio
> >>>
> >>>There are ways to extend the device to be more generic and collect
> >>>entropy from other sources, but this is simple enough and works for now.
> >>>
> >>>Signed-off-by: Amit Shah
> >>
> >>It's not this simple unfortunately.
> >>
> >>If you did this with libvirt, one guest could exhaust the available
> >>entropy for the remaining guests.  This could be used as a mechanism
> >>for one guest to attack another (reducing the available entropy for
> >>key generation).
> >>
> >>You need to rate limit the amount of entropy that a guest can obtain
> >>to allow management tools to mitigate this attack.
> >
> >Ultimately I think you need to have a push mechanism, where an external
> >process feeds entropy to QEMU, rather than a pull mechanism where QEMU
> >grabs entropy itself.
> 
> A previous patch didn't open urandom directly but instead talked to
> an entropy daemon.  This approach would allow libvirt to hand out
> entropy as it saw fit without requiring a new driver.

Using egd isn't necessary -- libvirt could still read urandom and pass
on to qemu at configured intervals.

Amit



[Qemu-devel] [PATCH 07/19] kvm: Add support for direct MSI injections

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

If the kernel supports KVM_SIGNAL_MSI, we can avoid the route-based
MSI injection mechanism.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 kvm-all.c |   22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index ff0534b..d222012 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -96,6 +96,7 @@ struct KVMState
 uint32_t *used_gsi_bitmap;
 unsigned int gsi_count;
 QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE];
+bool direct_msi;
 #endif
 };
 
@@ -898,8 +899,10 @@ static void kvm_init_irq_routing(KVMState *s)
 s->irq_routes = g_malloc0(sizeof(*s->irq_routes));
 s->nr_allocated_irq_routes = 0;
 
-for (i = 0; i < KVM_MSI_HASHTAB_SIZE; i++) {
-QTAILQ_INIT(&s->msi_hashtab[i]);
+if (!s->direct_msi) {
+for (i = 0; i < KVM_MSI_HASHTAB_SIZE; i++) {
+QTAILQ_INIT(&s->msi_hashtab[i]);
+}
 }
 
 kvm_arch_init_irq_routing(s);
@@ -1005,7 +1008,7 @@ static int kvm_irqchip_get_virq(KVMState *s)
 
 return bit - 1 + i * 32;
 }
-if (retry) {
+if (!s->direct_msi && retry) {
 retry = false;
 kvm_flush_dynamic_msi_routes(s);
 goto again;
@@ -1031,8 +1034,19 @@ static KVMMSIRoute *kvm_lookup_msi_route(KVMState *s, 
MSIMessage msg)
 
 int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
 {
+struct kvm_msi msi;
 KVMMSIRoute *route;
 
+if (s->direct_msi) {
+msi.address_lo = (uint32_t)msg.address;
+msi.address_hi = msg.address >> 32;
+msi.data = msg.data;
+msi.flags = 0;
+memset(msi.pad, 0, sizeof(msi.pad));
+
+return kvm_vm_ioctl(s, KVM_SIGNAL_MSI, &msi);
+}
+
 route = kvm_lookup_msi_route(s, msg);
 if (!route) {
 int virq, ret;
@@ -1209,6 +1223,8 @@ int kvm_init(void)
 s->pit_state2 = kvm_check_extension(s, KVM_CAP_PIT_STATE2);
 #endif
 
+s->direct_msi = (kvm_check_extension(s, KVM_CAP_SIGNAL_MSI) > 0);
+
 ret = kvm_arch_init(s);
 if (ret < 0) {
 goto err;
-- 
1.7.10.1




[Qemu-devel] [PATCH 01/19] kvm: Refactor KVMState::max_gsi to gsi_count

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

Instead of the bitmap size, store the maximum of GSIs the kernel
support. Move the GSI limit assertion to the API function
kvm_irqchip_add_route and make it stricter.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 kvm-all.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 9b73ccf..2d82d54 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -86,7 +86,7 @@ struct KVMState
 struct kvm_irq_routing *irq_routes;
 int nr_allocated_irq_routes;
 uint32_t *used_gsi_bitmap;
-unsigned int max_gsi;
+unsigned int gsi_count;
 #endif
 };
 
@@ -859,8 +859,6 @@ int kvm_irqchip_set_irq(KVMState *s, int irq, int level)
 #ifdef KVM_CAP_IRQ_ROUTING
 static void set_gsi(KVMState *s, unsigned int gsi)
 {
-assert(gsi < s->max_gsi);
-
 s->used_gsi_bitmap[gsi / 32] |= 1U << (gsi % 32);
 }
 
@@ -875,7 +873,7 @@ static void kvm_init_irq_routing(KVMState *s)
 /* Round up so we can search ints using ffs */
 gsi_bits = ALIGN(gsi_count, 32);
 s->used_gsi_bitmap = g_malloc0(gsi_bits / 8);
-s->max_gsi = gsi_bits;
+s->gsi_count = gsi_count;
 
 /* Mark any over-allocated bits as already in use */
 for (i = gsi_count; i < gsi_bits; i++) {
@@ -920,6 +918,8 @@ void kvm_irqchip_add_route(KVMState *s, int irq, int 
irqchip, int pin)
 {
 struct kvm_irq_routing_entry e;
 
+assert(pin < s->gsi_count);
+
 e.gsi = irq;
 e.type = KVM_IRQ_ROUTING_IRQCHIP;
 e.flags = 0;
-- 
1.7.10.1




Re: [Qemu-devel] [PATCH 11/13] pci: Create common pcibios_err_to_errno

2012-05-21 Thread Konrad Rzeszutek Wilk
On Fri, May 11, 2012 at 04:56:44PM -0600, Alex Williamson wrote:
> For returning errors out to non-PCI code.  Re-name xen's version.
> 
> Signed-off-by: Alex Williamson 

Acked-by: Konrad Rzeszutek Wilk 
> ---
> 
>  drivers/xen/xen-pciback/conf_space.c |6 +++---
>  include/linux/pci.h  |   26 ++
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/xen-pciback/conf_space.c 
> b/drivers/xen/xen-pciback/conf_space.c
> index 30d7be0..46ae0f9 100644
> --- a/drivers/xen/xen-pciback/conf_space.c
> +++ b/drivers/xen/xen-pciback/conf_space.c
> @@ -124,7 +124,7 @@ static inline u32 merge_value(u32 val, u32 new_val, u32 
> new_val_mask,
>   return val;
>  }
>  
> -static int pcibios_err_to_errno(int err)
> +static int xen_pcibios_err_to_errno(int err)
>  {
>   switch (err) {
>   case PCIBIOS_SUCCESSFUL:
> @@ -202,7 +202,7 @@ out:
>  pci_name(dev), size, offset, value);
>  
>   *ret_val = value;
> - return pcibios_err_to_errno(err);
> + return xen_pcibios_err_to_errno(err);
>  }
>  
>  int xen_pcibk_config_write(struct pci_dev *dev, int offset, int size, u32 
> value)
> @@ -290,7 +290,7 @@ int xen_pcibk_config_write(struct pci_dev *dev, int 
> offset, int size, u32 value)
>   }
>   }
>  
> - return pcibios_err_to_errno(err);
> + return xen_pcibios_err_to_errno(err);
>  }
>  
>  void xen_pcibk_config_free_dyn_fields(struct pci_dev *dev)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index b437225..20a8f2e 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -467,6 +467,32 @@ static inline bool pci_dev_msi_enabled(struct pci_dev 
> *pci_dev) { return false;
>  #define PCIBIOS_SET_FAILED   0x88
>  #define PCIBIOS_BUFFER_TOO_SMALL 0x89
>  
> +/*
> + * Translate above to generic errno for passing back through non-pci.
> + */
> +static inline int pcibios_err_to_errno(int err)
> +{
> + if (err <= PCIBIOS_SUCCESSFUL)
> + return err; /* Assume already errno */
> +
> + switch (err) {
> + case PCIBIOS_FUNC_NOT_SUPPORTED:
> + return -ENOENT;
> + case PCIBIOS_BAD_VENDOR_ID:
> + return -EINVAL;
> + case PCIBIOS_DEVICE_NOT_FOUND:
> + return -ENODEV;
> + case PCIBIOS_BAD_REGISTER_NUMBER:
> + return -EFAULT;
> + case PCIBIOS_SET_FAILED:
> + return -EIO;
> + case PCIBIOS_BUFFER_TOO_SMALL:
> + return -ENOSPC;
> + }
> +
> + return -ENOTTY;
> +}
> +
>  /* Low-level architecture-dependent routines */
>  
>  struct pci_ops {



[Qemu-devel] Current differences between qemu --enable-kvm and qemu-kvm?

2012-05-21 Thread Erik Rull

Hi all,

is there a summary existing that shows up the rough or actual differences 
between qemu --enable-kvm and qemu-kvm? I tested both versions with the 
same compile and start options, the CPU performance results are identical, 
only the bootup time of my guest system with qemu-kvm seemed to be a bit 
faster (not measured, it just feeled so).


Thanks.

Best regards,

Erik



Re: [Qemu-devel] [PATCH V2] booke_206_tlbwe: Discard invalid bits in MAS2

2012-05-21 Thread Alexander Graf


On 21.05.2012, at 18:31, Andreas Färber  wrote:

> Am 21.05.2012 18:11, schrieb Fabien Chouteau:
>> The size of EPN field in MAS2 depends on page size. This patch adds a
>> mask to discard invalid bits in EPN field.
>> 
>> Definition of EPN field from e500v2 RM:
>> EPN Effective page number: Depending on page size, only the bits
>> associated with a page boundary are valid. Bits that represent offsets
>> within a page are ignored and should be cleared.
>> 
>> There is a similar (but more complicated) definition in PowerISA V2.06.
>> 
>> Signed-off-by: Fabien Chouteau 
>> ---
>> target-ppc/op_helper.c |   17 +++--
>> 1 file changed, 15 insertions(+), 2 deletions(-)
>> 
>> diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
>> index 4ef2332..481b51c 100644
>> --- a/target-ppc/op_helper.c
>> +++ b/target-ppc/op_helper.c
>> @@ -4227,6 +4227,8 @@ void helper_booke206_tlbwe(void)
>> uint32_t tlbncfg, tlbn;
>> ppcmas_tlb_t *tlb;
>> uint32_t size_tlb, size_ps;
>> +target_ulong mask;
>> +
>> 
>> switch (env->spr[SPR_BOOKE_MAS0] & MAS0_WQ_MASK) {
>> case MAS0_WQ_ALWAYS:
> 
> Minor nitpick: This adds a second white line.
> 
> More severely, this patch is not marked as for 1.1 and I believe
> op_helper.c is dropped in Blue's AREG0 conversion, so I would recommend
> to rebase on that, since rebasing the large code movements was kind of
> nasty. Now that we've fixed ppc and ppc64 TCG it could be applied to
> ppc-next, no?

Good point. Blue, mind to resend? :)

Alex




[Qemu-devel] [PATCH 15/16] qapi: convert netdev_add

2012-05-21 Thread Luiz Capitulino
This is not a full QAPI conversion, but an intermediate step.

In essence, do_netdev_add() is split into three functions:

 1. netdev_add(): performs the actual work. This function is fully
converted to Error (thus, it's "qapi-friendly")

 2. qmp_netdev_add(): the QMP front-end for netdev_add(). This is
coded by hand and not auto-generated (gen=no in the schema). The
reason for this it's a lot easier and simpler to with QemuOpts
this way

 3. hmp_netdev_add(): HMP front-end.

This design was suggested by Paolo Bonzini.

Signed-off-by: Luiz Capitulino 
---
 hmp-commands.hx  |3 +--
 hmp.c|   21 +
 hmp.h|1 +
 net.c|   36 
 net.h|3 ++-
 qapi-schema.json |   28 
 qmp-commands.hx  |5 +
 7 files changed, 78 insertions(+), 19 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 81723c8..d0ce6a5 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1037,8 +1037,7 @@ ETEXI
 .args_type  = "netdev:O",
 .params = "[user|tap|socket],id=str[,prop=value][,...]",
 .help   = "add host network device",
-.user_print = monitor_user_noop,
-.mhandler.cmd_new = do_netdev_add,
+.mhandler.cmd = hmp_netdev_add,
 },
 
 STEXI
diff --git a/hmp.c b/hmp.c
index 42ced2a..7a4e25f 100644
--- a/hmp.c
+++ b/hmp.c
@@ -14,6 +14,8 @@
  */
 
 #include "hmp.h"
+#include "net.h"
+#include "qemu-option.h"
 #include "qemu-timer.h"
 #include "qmp-commands.h"
 
@@ -969,3 +971,22 @@ void hmp_dump_guest_memory(Monitor *mon, const QDict 
*qdict)
   &errp);
 hmp_handle_error(mon, &errp);
 }
+
+void hmp_netdev_add(Monitor *mon, const QDict *qdict)
+{
+Error *err = NULL;
+QemuOpts *opts;
+
+opts = qemu_opts_from_qdict(qemu_find_opts("netdev"), qdict, &err);
+if (error_is_set(&err)) {
+goto out;
+}
+
+netdev_add(opts, &err);
+if (error_is_set(&err)) {
+qemu_opts_del(opts);
+}
+
+out:
+hmp_handle_error(mon, &err);
+}
diff --git a/hmp.h b/hmp.h
index 5cf3241..017df87 100644
--- a/hmp.h
+++ b/hmp.h
@@ -62,5 +62,6 @@ void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate(Monitor *mon, const QDict *qdict);
 void hmp_device_del(Monitor *mon, const QDict *qdict);
 void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
+void hmp_netdev_add(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/net.c b/net.c
index 8f7cb40..5f0c53c 100644
--- a/net.c
+++ b/net.c
@@ -1234,27 +1234,39 @@ void net_host_device_remove(Monitor *mon, const QDict 
*qdict)
 qemu_del_vlan_client(vc);
 }
 
-int do_netdev_add(Monitor *mon, const QDict *qdict, QObject **ret_data)
+void netdev_add(QemuOpts *opts, Error **errp)
+{
+net_client_init(opts, 1, errp);
+}
+
+int qmp_netdev_add(Monitor *mon, const QDict *qdict, QObject **ret)
 {
 Error *local_err = NULL;
+QemuOptsList *opts_list;
 QemuOpts *opts;
-int res;
 
-opts = qemu_opts_from_qdict(qemu_find_opts("netdev"), qdict, &local_err);
-if (!opts) {
-qerror_report_err(local_err);
-error_free(local_err);
-return -1;
+opts_list = qemu_find_opts_err("netdev", &local_err);
+if (error_is_set(&local_err)) {
+goto exit_err;
 }
 
-res = net_client_init(opts, 1, &local_err);
-if (res < 0) {
-qerror_report_err(local_err);
-error_free(local_err);
+opts = qemu_opts_from_qdict(opts_list, qdict, &local_err);
+if (error_is_set(&local_err)) {
+goto exit_err;
+}
+
+netdev_add(opts, &local_err);
+if (error_is_set(&local_err)) {
 qemu_opts_del(opts);
+goto exit_err;
 }
 
-return res;
+return 0;
+
+exit_err:
+qerror_report_err(local_err);
+error_free(local_err);
+return -1;
 }
 
 int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
diff --git a/net.h b/net.h
index 7ee97e9..1eb9280 100644
--- a/net.h
+++ b/net.h
@@ -170,7 +170,8 @@ void net_check_clients(void);
 void net_cleanup(void);
 void net_host_device_add(Monitor *mon, const QDict *qdict);
 void net_host_device_remove(Monitor *mon, const QDict *qdict);
-int do_netdev_add(Monitor *mon, const QDict *qdict, QObject **ret_data);
+void netdev_add(QemuOpts *opts, Error **errp);
+int qmp_netdev_add(Monitor *mon, const QDict *qdict, QObject **ret);
 int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup"
diff --git a/qapi-schema.json b/qapi-schema.json
index dfa0e1f..69fcd8e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1798,3 +1798,31 @@
 { 'command': 'dump-guest-memory',
   'data': { 'paging': 'bool', 'protocol': 'str', '*begin': 'int',
 '*length': 'int' } }
+##
+# @netdev_add:
+#
+# Add a network backend.
+#
+# @type: the type of network backend.  Current valid values are 'user', 'tap',
+#  

[Qemu-devel] [PATCH v2 3/3] Add rate limiting of RTC_CHANGE, BALLOON_CHANGE & WATCHDOG events

2012-05-21 Thread Daniel P. Berrange
From: "Daniel P. Berrange" 

Allow certain event types to be rate limited to avoid flooding
monitor clients. The monitor_protocol_event() method is changed
such that instead of immediately emitting the event to Monitor
instances, it will call a new monitor_protocol_event_queue()
method.

This will check to see if the rate limit for the event has been
exceeded, and if so schedule a timer to wakeup at the end of the
rate limit period. If further events arrive before the timer fires,
the previously queued event will be discarded in favour of the new
event. The event will eventually be emitted when the timer fires.

This logic is applied to RTC_CHANGE, BALLOON_CHANGE & WATCHDOG
events, since the data associated with these events is stateless

 * monitor.c: Add support for rate limiting
 * monitor.h: Define monitor_global_init for one-time setup tasks
 * vl.c: Invoke monitor_global_init
 * trace-events: Add hooks for monitor event tracing

Signed-off-by: Daniel P. Berrange 
---
 monitor.c|  162 +++--
 monitor.h|1 +
 trace-events |5 ++
 vl.c |2 +
 4 files changed, 164 insertions(+), 6 deletions(-)

diff --git a/monitor.c b/monitor.c
index 75fd4cf..5135966 100644
--- a/monitor.c
+++ b/monitor.c
@@ -66,6 +66,7 @@
 #include "memory.h"
 #include "qmp-commands.h"
 #include "hmp.h"
+#include "qemu-thread.h"
 
 /* for pic/irq_info */
 #if defined(TARGET_SPARC)
@@ -145,6 +146,19 @@ typedef struct MonitorControl {
 int command_mode;
 } MonitorControl;
 
+/*
+ * To prevent flooding clients, events can be throttled. The
+ * throttling is calculating globally, rather than per-Monitor
+ * instance.
+ */
+typedef struct MonitorEventState {
+MonitorEvent event; /* Event being tracked */
+int64_t rate;   /* Period over which to throttle. 0 to disable */
+int64_t last;   /* Time at which event was last emitted */
+QEMUTimer *timer;   /* Timer for handling delayed events */
+QObject *data;  /* Event pending delayed dispatch */
+} MonitorEventState;
+
 struct Monitor {
 CharDriverState *chr;
 int mux_out;
@@ -447,6 +461,141 @@ static const char *monitor_event_names[] = {
 };
 QEMU_BUILD_BUG_ON(ARRAY_SIZE(monitor_event_names) != QEVENT_MAX)
 
+MonitorEventState monitor_event_state[QEVENT_MAX];
+QemuMutex monitor_event_state_lock;
+
+/*
+ * Emits the event to every monitor instance
+ */
+static void
+monitor_protocol_event_emit(MonitorEvent event,
+QObject *data)
+{
+Monitor *mon;
+
+trace_monitor_protocol_event_emit(event, data);
+QLIST_FOREACH(mon, &mon_list, entry) {
+if (monitor_ctrl_mode(mon) && qmp_cmd_mode(mon)) {
+monitor_json_emitter(mon, data);
+}
+}
+}
+
+
+/*
+ * Queue a new event for emission to Monitor instances,
+ * applying any rate limiting if required.
+ */
+static void
+monitor_protocol_event_queue(MonitorEvent event,
+ QObject *data)
+{
+MonitorEventState *evstate;
+int64_t now = qemu_get_clock_ns(rt_clock);
+assert(event < QEVENT_MAX);
+
+qemu_mutex_lock(&monitor_event_state_lock);
+evstate = &(monitor_event_state[event]);
+trace_monitor_protocol_event_queue(event,
+   data,
+   evstate->rate,
+   evstate->last,
+   now);
+
+/* Rate limit of 0 indicates no throttling */
+if (!evstate->rate) {
+monitor_protocol_event_emit(event, data);
+evstate->last = now;
+} else {
+int64_t delta = now - evstate->last;
+if (evstate->data ||
+delta < evstate->rate) {
+/* If there's an existing event pending, replace
+ * it with the new event, otherwise schedule a
+ * timer for delayed emission
+ */
+if (evstate->data) {
+qobject_decref(evstate->data);
+} else {
+int64_t then = evstate->last + evstate->rate;
+qemu_mod_timer_ns(evstate->timer, then);
+}
+evstate->data = data;
+qobject_incref(evstate->data);
+} else {
+monitor_protocol_event_emit(event, data);
+evstate->last = now;
+}
+}
+qemu_mutex_unlock(&monitor_event_state_lock);
+}
+
+
+/*
+ * The callback invoked by QemuTimer when a delayed
+ * event is ready to be emitted
+ */
+static void monitor_protocol_event_handler(void *opaque)
+{
+MonitorEventState *evstate = opaque;
+int64_t now = qemu_get_clock_ns(rt_clock);
+
+qemu_mutex_lock(&monitor_event_state_lock);
+
+trace_monitor_protocol_event_handler(evstate->event,
+ evstate->data,
+ evstate->last,
+ now);
+if (evstate->data) {
+monitor_

[Qemu-devel] [PATCH 12/19] kvm: Rename kvm_irqchip_add_route to kvm_irqchip_add_irq_route

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

We will add kvm_irqchip_add_msi_route, so let's make the difference
clearer.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 hw/pc_piix.c |8 
 kvm-all.c|2 +-
 kvm.h|2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 6a75718..c17f906 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -63,17 +63,17 @@ static void kvm_piix3_setup_irq_routing(bool pci_enabled)
 if (i == 2) {
 continue;
 }
-kvm_irqchip_add_route(s, i, KVM_IRQCHIP_PIC_MASTER, i);
+kvm_irqchip_add_irq_route(s, i, KVM_IRQCHIP_PIC_MASTER, i);
 }
 for (i = 8; i < 16; ++i) {
-kvm_irqchip_add_route(s, i, KVM_IRQCHIP_PIC_SLAVE, i - 8);
+kvm_irqchip_add_irq_route(s, i, KVM_IRQCHIP_PIC_SLAVE, i - 8);
 }
 if (pci_enabled) {
 for (i = 0; i < 24; ++i) {
 if (i == 0) {
-kvm_irqchip_add_route(s, i, KVM_IRQCHIP_IOAPIC, 2);
+kvm_irqchip_add_irq_route(s, i, KVM_IRQCHIP_IOAPIC, 2);
 } else if (i != 2) {
-kvm_irqchip_add_route(s, i, KVM_IRQCHIP_IOAPIC, i);
+kvm_irqchip_add_irq_route(s, i, KVM_IRQCHIP_IOAPIC, i);
 }
 }
 }
diff --git a/kvm-all.c b/kvm-all.c
index 1913d6a..0117837 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -935,7 +935,7 @@ static void kvm_add_routing_entry(KVMState *s,
 set_gsi(s, entry->gsi);
 }
 
-void kvm_irqchip_add_route(KVMState *s, int irq, int irqchip, int pin)
+void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin)
 {
 struct kvm_irq_routing_entry e;
 
diff --git a/kvm.h b/kvm.h
index 7857dbf..8b061bd 100644
--- a/kvm.h
+++ b/kvm.h
@@ -134,7 +134,7 @@ void kvm_arch_init_irq_routing(KVMState *s);
 int kvm_irqchip_set_irq(KVMState *s, int irq, int level);
 int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
 
-void kvm_irqchip_add_route(KVMState *s, int gsi, int irqchip, int pin);
+void kvm_irqchip_add_irq_route(KVMState *s, int gsi, int irqchip, int pin);
 int kvm_irqchip_commit_routes(KVMState *s);
 
 void kvm_put_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
-- 
1.7.10.1




[Qemu-devel] [PATCH 09/16] qemu-option: qemu_opts_from_qdict(): use error_set()

2012-05-21 Thread Luiz Capitulino
do_device_add() and do_netdev_add() call qerror_report_err() to maintain
their QError semantics.

Signed-off-by: Luiz Capitulino 
---
 hw/qdev-monitor.c |7 +--
 net.c |5 -
 qemu-option.c |   31 ---
 qemu-option.h |3 ++-
 4 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c
index eed781d..b01ef06 100644
--- a/hw/qdev-monitor.c
+++ b/hw/qdev-monitor.c
@@ -554,10 +554,13 @@ void do_info_qdm(Monitor *mon)
 
 int do_device_add(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
+Error *local_err = NULL;
 QemuOpts *opts;
 
-opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict);
-if (!opts) {
+opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, &local_err);
+if (error_is_set(&local_err)) {
+qerror_report_err(local_err);
+error_free(local_err);
 return -1;
 }
 if (!monitor_cur_is_qmp() && qdev_device_help(opts)) {
diff --git a/net.c b/net.c
index f5d9cc7..246209f 100644
--- a/net.c
+++ b/net.c
@@ -1237,11 +1237,14 @@ void net_host_device_remove(Monitor *mon, const QDict 
*qdict)
 
 int do_netdev_add(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
+Error *local_err = NULL;
 QemuOpts *opts;
 int res;
 
-opts = qemu_opts_from_qdict(qemu_find_opts("netdev"), qdict);
+opts = qemu_opts_from_qdict(qemu_find_opts("netdev"), qdict, &local_err);
 if (!opts) {
+qerror_report_err(local_err);
+error_free(local_err);
 return -1;
 }
 
diff --git a/qemu-option.c b/qemu-option.c
index 1cddb9d..bb3886c 100644
--- a/qemu-option.c
+++ b/qemu-option.c
@@ -968,13 +968,19 @@ void qemu_opts_set_defaults(QemuOptsList *list, const 
char *params,
 assert(opts);
 }
 
+typedef struct OptsFromQDictState {
+QemuOpts *opts;
+Error **errp;
+} OptsFromQDictState;
+
 static void qemu_opts_from_qdict_1(const char *key, QObject *obj, void *opaque)
 {
+OptsFromQDictState *state = opaque;
 char buf[32];
 const char *value;
 int n;
 
-if (!strcmp(key, "id")) {
+if (!strcmp(key, "id") || error_is_set(state->errp)) {
 return;
 }
 
@@ -1002,7 +1008,8 @@ static void qemu_opts_from_qdict_1(const char *key, 
QObject *obj, void *opaque)
 default:
 return;
 }
-qemu_opt_set(opaque, key, value);
+
+qemu_opt_set_err(state->opts, key, value, state->errp);
 }
 
 /*
@@ -1011,21 +1018,31 @@ static void qemu_opts_from_qdict_1(const char *key, 
QObject *obj, void *opaque)
  * Only QStrings, QInts, QFloats and QBools are copied.  Entries with
  * other types are silently ignored.
  */
-QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const QDict *qdict)
+QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const QDict *qdict,
+   Error **errp)
 {
-QemuOpts *opts;
+OptsFromQDictState state;
 Error *local_err = NULL;
+QemuOpts *opts;
 
 opts = qemu_opts_create(list, qdict_get_try_str(qdict, "id"), 1,
 &local_err);
 if (error_is_set(&local_err)) {
-qerror_report_err(local_err);
-error_free(local_err);
+error_propagate(errp, local_err);
 return NULL;
 }
 
 assert(opts != NULL);
-qdict_iter(qdict, qemu_opts_from_qdict_1, opts);
+
+state.errp = &local_err;
+state.opts = opts;
+qdict_iter(qdict, qemu_opts_from_qdict_1, &state);
+if (error_is_set(&local_err)) {
+error_propagate(errp, local_err);
+qemu_opts_del(opts);
+return NULL;
+}
+
 return opts;
 }
 
diff --git a/qemu-option.h b/qemu-option.h
index c0e022b..951dec3 100644
--- a/qemu-option.h
+++ b/qemu-option.h
@@ -132,7 +132,8 @@ int qemu_opts_do_parse(QemuOpts *opts, const char *params, 
const char *firstname
 QemuOpts *qemu_opts_parse(QemuOptsList *list, const char *params, int 
permit_abbrev);
 void qemu_opts_set_defaults(QemuOptsList *list, const char *params,
 int permit_abbrev);
-QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const QDict *qdict);
+QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const QDict *qdict,
+   Error **errp);
 QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict);
 
 typedef int (*qemu_opts_loopfunc)(QemuOpts *opts, void *opaque);
-- 
1.7.9.2.384.g4a92a




[Qemu-devel] [PATCH 11/19] msix: Introduce vector notifiers

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

Vector notifiers shall be triggered by the MSI/MSI-X core whenever a
relevant configuration change is programmed by the guest. In case of
MSI-X, changes are reported when the effective mask (global &&
per-vector) alters its state. On unmask, the current vector
configuration is included in the event report. This allows users - e.g.
virtio-pci layer - to transfer this information to external MSI-X
routing subsystems - like vhost + KVM in-kernel irqchip.

This implementation only provides MSI-X support, but extension to MSI is
feasible and will be provided later on when adding support for KVM PCI
device assignment.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 hw/msix.c |   93 +
 hw/msix.h |4 +++
 hw/pci.h  |8 ++
 3 files changed, 105 insertions(+)

diff --git a/hw/msix.c b/hw/msix.c
index e1a7d92..1622e16 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -139,13 +139,34 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
 return msix_vector_masked(dev, vector, dev->msix_function_masked);
 }
 
+static void msix_fire_vector_notifier(PCIDevice *dev,
+  unsigned int vector, bool is_masked)
+{
+MSIMessage msg;
+int ret;
+
+if (!dev->msix_vector_use_notifier) {
+return;
+}
+if (is_masked) {
+dev->msix_vector_release_notifier(dev, vector);
+} else {
+msg = msix_get_message(dev, vector);
+ret = dev->msix_vector_use_notifier(dev, vector, msg);
+assert(ret >= 0);
+}
+}
+
 static void msix_handle_mask_update(PCIDevice *dev, int vector, bool 
was_masked)
 {
 bool is_masked = msix_is_masked(dev, vector);
+
 if (is_masked == was_masked) {
 return;
 }
 
+msix_fire_vector_notifier(dev, vector, is_masked);
+
 if (!is_masked && msix_is_pending(dev, vector)) {
 msix_clr_pending(dev, vector);
 msix_notify(dev, vector);
@@ -330,6 +351,7 @@ void msix_save(PCIDevice *dev, QEMUFile *f)
 void msix_load(PCIDevice *dev, QEMUFile *f)
 {
 unsigned n = dev->msix_entries_nr;
+unsigned int vector;
 
 if (!(dev->cap_present & QEMU_PCI_CAP_MSIX)) {
 return;
@@ -339,6 +361,10 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
 qemu_get_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
 qemu_get_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
 msix_update_function_masked(dev);
+
+for (vector = 0; vector < n; vector++) {
+msix_handle_mask_update(dev, vector, true);
+}
 }
 
 /* Does device support MSI-X? */
@@ -425,3 +451,70 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 return;
 msix_free_irq_entries(dev);
 }
+
+static int msix_set_notifier_for_vector(PCIDevice *dev, unsigned int vector)
+{
+MSIMessage msg;
+
+if (msix_is_masked(dev, vector)) {
+return 0;
+}
+msg = msix_get_message(dev, vector);
+return dev->msix_vector_use_notifier(dev, vector, msg);
+}
+
+static void msix_unset_notifier_for_vector(PCIDevice *dev, unsigned int vector)
+{
+if (msix_is_masked(dev, vector)) {
+return;
+}
+dev->msix_vector_release_notifier(dev, vector);
+}
+
+int msix_set_vector_notifiers(PCIDevice *dev,
+  MSIVectorUseNotifier use_notifier,
+  MSIVectorReleaseNotifier release_notifier)
+{
+int vector, ret;
+
+assert(use_notifier && release_notifier);
+
+dev->msix_vector_use_notifier = use_notifier;
+dev->msix_vector_release_notifier = release_notifier;
+
+if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
+(MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
+for (vector = 0; vector < dev->msix_entries_nr; vector++) {
+ret = msix_set_notifier_for_vector(dev, vector);
+if (ret < 0) {
+goto undo;
+}
+}
+}
+return 0;
+
+undo:
+while (--vector >= 0) {
+msix_unset_notifier_for_vector(dev, vector);
+}
+dev->msix_vector_use_notifier = NULL;
+dev->msix_vector_release_notifier = NULL;
+return ret;
+}
+
+void msix_unset_vector_notifiers(PCIDevice *dev)
+{
+int vector;
+
+assert(dev->msix_vector_use_notifier &&
+   dev->msix_vector_release_notifier);
+
+if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
+(MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
+for (vector = 0; vector < dev->msix_entries_nr; vector++) {
+msix_unset_notifier_for_vector(dev, vector);
+}
+}
+dev->msix_vector_use_notifier = NULL;
+dev->msix_vector_release_notifier = NULL;
+}
diff --git a/hw/msix.h b/hw/msix.h
index 5aba22b..f33f18b 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -29,4 +29,8 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
+int msix_set_vector_notifiers(PCIDevice *dev,
+ 

Re: [Qemu-devel] [PATCH 1/2] pci: Add pci_device_get_host_irq

2012-05-21 Thread Michael S. Tsirkin
On Mon, May 21, 2012 at 10:13:47AM -0300, Jan Kiszka wrote:
> @@ -386,6 +387,14 @@ static void piix3_set_irq(void *opaque, int pirq, int 
> level)
>  piix3_set_irq_level(piix3, pirq, level);
>  }
>  
> +static int piix3_map_host_irq(void *opaque, int pci_intx)
> +{
> +PIIX3State *piix3 = opaque;
> +int host_irq = piix3->dev.config[PIIX_PIRQC + pci_intx];
> +
> +return host_irq < PIIX_NUM_PIC_IRQS ? host_irq : -1;
> +}
> +
>  /* irq routing is changed. so rebuild bitmap */
>  static void piix3_update_irq_levels(PIIX3State *piix3)
>  {


So, instead of special API just for assignment,
I would like to see map_irq in piix being reworked
to take dev config into account.
I think piix is almost unique in this but need to check,
if not fix other host buses that are programmable.
PCI bridges are all fixed routing.

Then we can drop set_irq callback.

Finally all devices can cache the irq#,
and piix would scan devices behind it
and update the irq.

Assignment then just needs a notifier, since
it owns the device just a pointer in device is
enough.

Could you look at doing this please?
If no I can give it a stub.

-- 
MST



[Qemu-devel] [PATCH 05/19] kvm: x86: Wire up MSI support for in-kernel irqchip

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

Catch writes to the MSI MMIO region in the KVM APIC and forward them to
the kernel. Provide the kernel support GSI routing, this allows to
enable MSI support also for in-kernel irqchip mode.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 hw/kvm/apic.c |   34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
index ffe7a52..8ba4079 100644
--- a/hw/kvm/apic.c
+++ b/hw/kvm/apic.c
@@ -10,6 +10,7 @@
  * See the COPYING file in the top-level directory.
  */
 #include "hw/apic_internal.h"
+#include "hw/msi.h"
 #include "kvm.h"
 
 static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic,
@@ -145,10 +146,39 @@ static void kvm_apic_external_nmi(APICCommonState *s)
 run_on_cpu(s->cpu_env, do_inject_external_nmi, s);
 }
 
+static uint64_t kvm_apic_mem_read(void *opaque, target_phys_addr_t addr,
+  unsigned size)
+{
+return ~(uint64_t)0;
+}
+
+static void kvm_apic_mem_write(void *opaque, target_phys_addr_t addr,
+   uint64_t data, unsigned size)
+{
+MSIMessage msg = { .address = addr, .data = data };
+int ret;
+
+ret = kvm_irqchip_send_msi(kvm_state, msg);
+if (ret < 0) {
+fprintf(stderr, "KVM: injection failed, MSI lost (%s)\n",
+strerror(-ret));
+}
+}
+
+static const MemoryRegionOps kvm_apic_io_ops = {
+.read = kvm_apic_mem_read,
+.write = kvm_apic_mem_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
 static void kvm_apic_init(APICCommonState *s)
 {
-memory_region_init_reservation(&s->io_memory, "kvm-apic-msi",
-   MSI_SPACE_SIZE);
+memory_region_init_io(&s->io_memory, &kvm_apic_io_ops, s, "kvm-apic-msi",
+  MSI_SPACE_SIZE);
+
+if (kvm_has_gsi_routing()) {
+msi_supported = true;
+}
 }
 
 static void kvm_apic_class_init(ObjectClass *klass, void *data)
-- 
1.7.10.1




Re: [Qemu-devel] [PATCH 1/2] pci: Add pci_device_get_host_irq

2012-05-21 Thread Jan Kiszka
On 2012-05-21 14:34, Michael S. Tsirkin wrote:
> On Mon, May 21, 2012 at 10:13:47AM -0300, Jan Kiszka wrote:
>> Add a PCI IRQ path discovery function that walks from a given device to
>> the host bridge, returning the IRQ number that is reported to the
>> attached interrupt controller. For this purpose, another PCI bridge
>> callback function is introduced: map_host_irq. It is so far only
>> implemented by the PIIX3, other host bridges can be added later on as
>> required.
>>
>> Will be used for KVM PCI device assignment.
>>
>> Signed-off-by: Jan Kiszka 
> 
> interrupt injection is data path even for emulated devices.
> So instead of special casing device assignment I would like to see all
> devices converted to an API that caches irqs.
> 
> This will likely mean that we can maintain the final
> irq as part of the pci device structure, and
> this api will simply return it.

Yep, I definitely agree. It's just that such a design has to please even
more users than PCI devices, thus will likely take longer to settle than
the device assignment effort. Therefore I decided to rush forward with
an intermediate approach first.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



[Qemu-devel] [PATCH 08/19] kvm: Enable in-kernel irqchip support by default

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

As MSI is now fully supported by KVM (/wrt available features in
upstream), we can finally enable the in-kernel irqchip by default.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 kvm-all.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kvm-all.c b/kvm-all.c
index d222012..1913d6a 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1099,7 +1099,7 @@ static int kvm_irqchip_create(KVMState *s)
 
 if (QTAILQ_EMPTY(&list->head) ||
 !qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
-   "kernel_irqchip", false) ||
+   "kernel_irqchip", true) ||
 !kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
 return 0;
 }
-- 
1.7.10.1




[Qemu-devel] [PATCH v2 2/3] Add event notification for guest balloon changes

2012-05-21 Thread Daniel P. Berrange
From: "Daniel P. Berrange" 

After setting a balloon target value, applications have to
continually poll 'query-balloon' to determine whether the
guest has reacted to this request. The virtio-balloon backend
knows exactly when the guest has reacted though, and thus it
is possible to emit a JSON event to tell the mgmt application
whenever the guest balloon changes.

This introduces a new 'qemu_balloon_change()' API which is
to be called by balloon driver backends, whenever they have
a change in balloon value. This takes the 'actual' balloon
value, as would be found in the BalloonInfo struct.

The qemu_balloon_change API emits a JSON monitor event which
looks like:

  {"timestamp": {"seconds": 1337162462, "microseconds": 814521},
   "event": "BALLOON_CHANGE", "data": {"actual": 944766976}}

* balloon.c, balloon.h: Introduce qemu_balloon_change() for
  emitting balloon change events on the monitor
* hw/virtio-balloon.c: Invoke qemu_balloon_change() whenever
  the guest changes the balloon actual value
* monitor.c, monitor.h: Define QEVENT_BALLOON_CHANGE

Signed-off-by: Daniel P. Berrange 
---
 QMP/qmp-events.txt  |   18 ++
 balloon.c   |   14 ++
 balloon.h   |2 ++
 hw/virtio-balloon.c |5 +
 monitor.c   |1 +
 monitor.h   |1 +
 6 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index 9286af5..9ba7079 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -335,3 +335,21 @@ Example:
"len": 10737418240, "offset": 134217728,
"speed": 0 },
  "timestamp": { "seconds": 1267061043, "microseconds": 959568 } }
+
+
+BALLOON_CHANGE
+--
+
+Emitted when the guest changes the actual BALLOON level. This
+value is equivalent to the 'actual' field return by the
+'query-balloon' command
+
+Data:
+
+- "actual": actual level of the guest memory balloon in bytes (json-number)
+
+Example:
+
+{ "event": "BALLOON_CHANGE",
+"data": { "actual": 944766976 },
+"timestamp": { "seconds": 1267020223, "microseconds": 435656 } }
diff --git a/balloon.c b/balloon.c
index aa354f7..913862b 100644
--- a/balloon.c
+++ b/balloon.c
@@ -30,6 +30,7 @@
 #include "balloon.h"
 #include "trace.h"
 #include "qmp-commands.h"
+#include "qjson.h"
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
@@ -80,6 +81,19 @@ static int qemu_balloon_status(BalloonInfo *info)
 return 1;
 }
 
+void qemu_balloon_change(int64_t actual)
+{
+QObject *data;
+
+data = qobject_from_jsonf("{ 'actual': %" PRId64 " }",
+  actual);
+
+monitor_protocol_event(QEVENT_BALLOON_CHANGE, data);
+
+qobject_decref(data);
+}
+
+
 BalloonInfo *qmp_query_balloon(Error **errp)
 {
 BalloonInfo *info;
diff --git a/balloon.h b/balloon.h
index b60fd5d..2ebac0d 100644
--- a/balloon.h
+++ b/balloon.h
@@ -24,4 +24,6 @@ int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
 QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 
+void qemu_balloon_change(int64_t actual);
+
 #endif
diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index ce9d2c9..9137573 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -146,8 +146,13 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 {
 VirtIOBalloon *dev = to_virtio_balloon(vdev);
 struct virtio_balloon_config config;
+uint32_t oldactual = dev->actual;
 memcpy(&config, config_data, 8);
 dev->actual = le32_to_cpu(config.actual);
+if (dev->actual != oldactual) {
+qemu_balloon_change(ram_size -
+(dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
+}
 }
 
 static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f)
diff --git a/monitor.c b/monitor.c
index a3bc2c7..75fd4cf 100644
--- a/monitor.c
+++ b/monitor.c
@@ -443,6 +443,7 @@ static const char *monitor_event_names[] = {
 [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
 [QEVENT_SUSPEND] = "SUSPEND",
 [QEVENT_WAKEUP] = "WAKEUP",
+[QEVENT_BALLOON_CHANGE] = "BALLOON_CHANGE",
 };
 QEMU_BUILD_BUG_ON(ARRAY_SIZE(monitor_event_names) != QEVENT_MAX)
 
diff --git a/monitor.h b/monitor.h
index cd1d878..5f4de1b 100644
--- a/monitor.h
+++ b/monitor.h
@@ -41,6 +41,7 @@ typedef enum MonitorEvent {
 QEVENT_DEVICE_TRAY_MOVED,
 QEVENT_SUSPEND,
 QEVENT_WAKEUP,
+QEVENT_BALLOON_CHANGE,
 
 /* Add to 'monitor_event_names' array in monitor.c when
  * defining new events here */
-- 
1.7.7.6




[Qemu-devel] [PATCH v2 1/3] Add 'query-events' command to QMP to query async events

2012-05-21 Thread Daniel P. Berrange
From: "Daniel P. Berrange" 

Sometimes it is neccessary for an application to determine
whether a particular QMP event is available, so they can
decide whether to use compatibility code instead. This
introduces a new 'query-events' command to QMP to do just
that

 { "execute": "query-events" }
 {"return": [{"name": "WAKEUP"},
 {"name": "SUSPEND"},
 {"name": "DEVICE_TRAY_MOVED"},
 {"name": "BLOCK_JOB_CANCELLED"},
 {"name": "BLOCK_JOB_COMPLETED"},
 ...snip...
 {"name": "SHUTDOWN"}]}

* monitor.c: Turn MonitorEvent -> string conversion
  into a lookup from a static table of constant strings.
  Add impl of qmp_query_events monitor command handler
* qapi-schema.json, qmp-commands.hx: Define contract of
  query-events command

Signed-off-by: Daniel P. Berrange 
---
 monitor.c|  107 +++---
 monitor.h|4 ++
 qapi-schema.json |   22 +++
 qmp-commands.hx  |   37 +++
 4 files changed, 108 insertions(+), 62 deletions(-)

diff --git a/monitor.c b/monitor.c
index 12a6fe2..a3bc2c7 100644
--- a/monitor.c
+++ b/monitor.c
@@ -422,6 +422,30 @@ static void timestamp_put(QDict *qdict)
 qdict_put_obj(qdict, "timestamp", obj);
 }
 
+
+static const char *monitor_event_names[] = {
+[QEVENT_SHUTDOWN] = "SHUTDOWN",
+[QEVENT_RESET] = "RESET",
+[QEVENT_POWERDOWN] = "POWERDOWN",
+[QEVENT_STOP] = "STOP",
+[QEVENT_RESUME] = "RESUME",
+[QEVENT_VNC_CONNECTED] = "VNC_CONNECTED",
+[QEVENT_VNC_INITIALIZED] = "VNC_INITIALIZED",
+[QEVENT_VNC_DISCONNECTED] = "VNC_DISCONNECTED",
+[QEVENT_BLOCK_IO_ERROR] = "BLOCK_IO_ERROR",
+[QEVENT_RTC_CHANGE] = "RTC_CHANGE",
+[QEVENT_WATCHDOG] = "WATCHDOG",
+[QEVENT_SPICE_CONNECTED] = "SPICE_CONNECTED",
+[QEVENT_SPICE_INITIALIZED] = "SPICE_INITIALIZED",
+[QEVENT_SPICE_DISCONNECTED] = "SPICE_DISCONNECTED",
+[QEVENT_BLOCK_JOB_COMPLETED] = "BLOCK_JOB_COMPLETED",
+[QEVENT_BLOCK_JOB_CANCELLED] = "BLOCK_JOB_CANCELLED",
+[QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
+[QEVENT_SUSPEND] = "SUSPEND",
+[QEVENT_WAKEUP] = "WAKEUP",
+};
+QEMU_BUILD_BUG_ON(ARRAY_SIZE(monitor_event_names) != QEVENT_MAX)
+
 /**
  * monitor_protocol_event(): Generate a Monitor event
  *
@@ -435,68 +459,8 @@ void monitor_protocol_event(MonitorEvent event, QObject 
*data)
 
 assert(event < QEVENT_MAX);
 
-switch (event) {
-case QEVENT_SHUTDOWN:
-event_name = "SHUTDOWN";
-break;
-case QEVENT_RESET:
-event_name = "RESET";
-break;
-case QEVENT_POWERDOWN:
-event_name = "POWERDOWN";
-break;
-case QEVENT_STOP:
-event_name = "STOP";
-break;
-case QEVENT_RESUME:
-event_name = "RESUME";
-break;
-case QEVENT_VNC_CONNECTED:
-event_name = "VNC_CONNECTED";
-break;
-case QEVENT_VNC_INITIALIZED:
-event_name = "VNC_INITIALIZED";
-break;
-case QEVENT_VNC_DISCONNECTED:
-event_name = "VNC_DISCONNECTED";
-break;
-case QEVENT_BLOCK_IO_ERROR:
-event_name = "BLOCK_IO_ERROR";
-break;
-case QEVENT_RTC_CHANGE:
-event_name = "RTC_CHANGE";
-break;
-case QEVENT_WATCHDOG:
-event_name = "WATCHDOG";
-break;
-case QEVENT_SPICE_CONNECTED:
-event_name = "SPICE_CONNECTED";
-break;
-case QEVENT_SPICE_INITIALIZED:
-event_name = "SPICE_INITIALIZED";
-break;
-case QEVENT_SPICE_DISCONNECTED:
-event_name = "SPICE_DISCONNECTED";
-break;
-case QEVENT_BLOCK_JOB_COMPLETED:
-event_name = "BLOCK_JOB_COMPLETED";
-break;
-case QEVENT_BLOCK_JOB_CANCELLED:
-event_name = "BLOCK_JOB_CANCELLED";
-break;
-case QEVENT_DEVICE_TRAY_MOVED:
- event_name = "DEVICE_TRAY_MOVED";
-break;
-case QEVENT_SUSPEND:
-event_name = "SUSPEND";
-break;
-case QEVENT_WAKEUP:
-event_name = "WAKEUP";
-break;
-default:
-abort();
-break;
-}
+event_name = monitor_event_names[event];
+assert(event_name != NULL);
 
 qmp = qdict_new();
 timestamp_put(qmp);
@@ -738,6 +702,25 @@ CommandInfoList *qmp_query_commands(Error **errp)
 return cmd_list;
 }
 
+EventInfoList *qmp_query_events(Error **errp)
+{
+EventInfoList *info, *ev_list = NULL;
+MonitorEvent e;
+
+for (e = 0 ; e < QEVENT_MAX ; e++) {
+const char *event_name = monitor_event_names[e];
+assert(event_name != NULL);
+info = g_malloc0(sizeof(*info));
+info->value = g_malloc0(sizeof(*info->value));
+info->value->name = g_strdup(ev

[Qemu-devel] [PATCH 17/19] kvm: Enable use of kvm_irqchip_in_kernel in hwlib code

2012-05-21 Thread Avi Kivity
From: Jan Kiszka 

Provide a dummy kvm_kernel_irqchip so that kvm_irqchip_in_kernel can be
used by code that is not under CONFIG_KVM protection.

Signed-off-by: Jan Kiszka 
Signed-off-by: Avi Kivity 
---
 kvm-stub.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/kvm-stub.c b/kvm-stub.c
index b4cf03f..ec9a364 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -18,6 +18,7 @@
 #include "kvm.h"
 
 KVMState *kvm_state;
+bool kvm_kernel_irqchip;
 
 int kvm_init_vcpu(CPUArchState *env)
 {
-- 
1.7.10.1




Re: [Qemu-devel] [PATCH 0/3] tighten scope of accidentally global variables

2012-05-21 Thread Jim Meyering
Blue Swirl wrote:
> On Mon, May 21, 2012 at 6:10 PM, Jim Meyering  wrote:
>> Blue Swirl wrote:
>>> On Mon, May 21, 2012 at 10:03 AM, Jim Meyering  wrote:
 From: Jim Meyering 

 I noticed this commit,

    virtio-pci: add missing 'static'

 which made this change:

    > -const MemoryRegionPortio virtio_portio[] = {
    > +static const MemoryRegionPortio virtio_portio[] = {

 and wondered if there were other variables like that.
 The following command shows that there are:
 [note that there are probably more: this finds only those
  for which the variable name appears in only one source file. ]
>>>
>>> Also, only for files at the top level.
>>
>> Thanks.  There were so many .o files, I assumed that all were at the top.
>> Searching all .o files, I found many more:
>>
>>  $ for i in $(nm -e $(find . -name '*.o')|sed -n 's/.* [BCDGRS] //p'|sort 
>> -u);\
>>    do test $(git grep -lw $i|wc -l) = 1 && echo $i;done
>
> How about just checking the executables (*-softmmu/qemu-system-xyz and
> linux-user/qemu-xyz)?

If I understood what you suggest, that would seem to miss only "test_image":
(README-scope-vars is the result of the one above using find and *.o)

$ for i in $(nm -e *-softmmu/qemu-system-* linux-user/qemu-*|sed -n 's/.* 
[BCDGRS] //p'|sort -u); do test $(git grep -lw $i|wc -l) = 1 && echo $i;done > k
nm: linux-user/qemu-types.h: File format not recognized
[Exit 1]
$ diff -u k README-scope-vars
--- k   2012-05-21 20:38:15.912149444 +0200
+++ README-scope-vars   2012-05-21 20:12:13.270023293 +0200
@@ -53,6 +62,7 @@ rtas_next
 s390_virtio_bus_info
 sh_table
 special_register_prefix
+test_image
 timers_state
 v9fs_synth_root
 virtcon_hds



  1   2   3   >