date:20160303

[Qemu-devel] [PULL 14/16] fdc: add function to determine drive chs limits

2016-03-03 Thread Michael S. Tsirkin

From: Roman Kagan 

When populating ACPI objects for floppy drives one needs to provide the
maximum values for cylinder, sector, and head number the drive supports.

This patch adds a function that iterates through the array of predefined
floppy drive formats and returns the maximum values of c, h, s, out of
those matching the given floppy drive type.

Signed-off-by: Roman Kagan 
Cc: Igor Mammedov 
Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Cc: John Snow 
Cc: Laszlo Ersek 
Cc: Kevin O'Connor 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: John Snow 
---
 include/hw/block/fdc.h |  2 ++
 hw/block/fdc.c | 23 +++
 2 files changed, 25 insertions(+)

diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
index adce14f..1749dab 100644
--- a/include/hw/block/fdc.h
+++ b/include/hw/block/fdc.h
@@ -15,5 +15,7 @@ void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
DriveInfo **fds, qemu_irq *fdc_tc);
 
 FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
+void isa_fdc_get_drive_max_chs(FloppyDriveType type,
+   uint8_t *maxc, uint8_t *maxh, uint8_t *maxs);
 
 #endif
diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 9838d21..fc3aef9 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -2557,6 +2557,29 @@ FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, 
int i)
 return isa->state.drives[i].drive;
 }
 
+void isa_fdc_get_drive_max_chs(FloppyDriveType type,
+   uint8_t *maxc, uint8_t *maxh, uint8_t *maxs)
+{
+const FDFormat *fdf;
+
+*maxc = *maxh = *maxs = 0;
+for (fdf = fd_formats; fdf->drive != FLOPPY_DRIVE_TYPE_NONE; fdf++) {
+if (fdf->drive != type) {
+continue;
+}
+if (*maxc < fdf->max_track) {
+*maxc = fdf->max_track;
+}
+if (*maxh < fdf->max_head) {
+*maxh = fdf->max_head;
+}
+if (*maxs < fdf->last_sect) {
+*maxs = fdf->last_sect;
+}
+}
+(*maxc)--;
+}
+
 static const VMStateDescription vmstate_isa_fdc ={
 .name = "fdc",
 .version_id = 2,
-- 
MST

[Qemu-devel] [PULL 10/16] vhost-user: verify that number of queues is less than MAX_QUEUE_NUM

2016-03-03 Thread Michael S. Tsirkin

From: Ilya Maximets 

Fix QEMU crash when -netdev vhost-user,queues=n is passed with number
of queues greater than MAX_QUEUE_NUM.

Signed-off-by: Ilya Maximets 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Acked-by: Jason Wang 
---
 net/vhost-user.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 451dbbf..b753b3d 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -317,9 +317,10 @@ int net_init_vhost_user(const NetClientOptions *opts, 
const char *name,
 }
 
 queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1;
-if (queues < 1) {
+if (queues < 1 || queues > MAX_QUEUE_NUM) {
 error_setg(errp,
-   "vhost-user number of queues must be bigger than zero");
+   "vhost-user number of queues must be in range [1, %d]",
+   MAX_QUEUE_NUM);
 return -1;
 }
 
-- 
MST

Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-03 Thread Roman Kagan

On Thu, Mar 03, 2016 at 05:46:15PM +, Dr. David Alan Gilbert wrote:
> * Liang Li (liang.z...@intel.com) wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> > 
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free
> > pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> > the network traffic significantly while speed up the live migration
> > process obviously.
> > 
> > This patch set is the QEMU side implementation.
> > 
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> > 
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> 
> Hi,
>   An interesting solution; I know a few different people have been looking
> at how to speed up ballooned VM migration.
> 
>   I wonder if it would be possible to avoid the kernel changes by
> parsing /proc/self/pagemap - if that can be used to detect unmapped/zero
> mapped pages in the guest ram, would it achieve the same result?

Yes I was about to suggest the same thing: it's simple and makes use of
the existing infrastructure.  And you wouldn't need to care if the pages
were unmapped by ballooning or anything else (alternative balloon
implementations, not yet touched by the guest, etc.).  Besides, you
wouldn't need to synchronize with the guest.

Roman.

[Qemu-devel] [PULL 09/16] virtio-balloon: add 'available' counter

2016-03-03 Thread Michael S. Tsirkin

From: "Denis V. Lunev" 

The patch for the kernel part is in linux-next already:
commit ac88e7c908b920866e529862f2b2f0129b254ab2
Author: Igor Redko 
Date:   Thu Feb 18 09:23:01 2016 +1100

virtio_balloon: export 'available' memory to balloon statistics

Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
statistics protocol, corresponding to 'Available' in /proc/meminfo.

Signed-off-by: Denis V. Lunev 
CC: Igor Redko 
CC: Michael S. Tsirkin 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/standard-headers/linux/virtio_balloon.h | 3 ++-
 hw/virtio/virtio-balloon.c  | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index 2e2a6dc..0df7c2e 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -51,7 +51,8 @@ struct virtio_balloon_config {
 #define VIRTIO_BALLOON_S_MINFLT   3   /* Number of minor faults */
 #define VIRTIO_BALLOON_S_MEMFREE  4   /* Total amount of free memory */
 #define VIRTIO_BALLOON_S_MEMTOT   5   /* Total amount of memory */
-#define VIRTIO_BALLOON_S_NR   6
+#define VIRTIO_BALLOON_S_AVAIL6   /* Amount of available memory in guest */
+#define VIRTIO_BALLOON_S_NR   7
 
 /*
  * Memory statistics structure.
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 64367ac..31c1aec 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -53,6 +53,7 @@ static const char *balloon_stat_names[] = {
[VIRTIO_BALLOON_S_MINFLT] = "stat-minor-faults",
[VIRTIO_BALLOON_S_MEMFREE] = "stat-free-memory",
[VIRTIO_BALLOON_S_MEMTOT] = "stat-total-memory",
+   [VIRTIO_BALLOON_S_AVAIL] = "stat-available-memory",
[VIRTIO_BALLOON_S_NR] = NULL
 };
 
-- 
MST

[Qemu-devel] [PULL 16/16] i386: update expected DSDT

2016-03-03 Thread Michael S. Tsirkin

DSDT was changed by:
commit 95cad0a1974a07f91b6f85324dfe3e18ee27b30a ("i386: populate floppy
drive information in DSDT").

Update expected files accordingly.

Signed-off-by: Michael S. Tsirkin 
---
 tests/acpi-test-data/pc/DSDT | Bin 5478 -> 5527 bytes
 tests/acpi-test-data/pc/DSDT.bridge  | Bin 7337 -> 7386 bytes
 tests/acpi-test-data/q35/DSDT| Bin 8321 -> 8233 bytes
 tests/acpi-test-data/q35/DSDT.bridge | Bin 8338 -> 8250 bytes
 4 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/tests/acpi-test-data/pc/DSDT b/tests/acpi-test-data/pc/DSDT
index 
ec0e642b06967d3ec4d7a7d252b82b916f37a5e0..79e3c7d9f71dd9495a4c31d24f38febd8e25bbcb
 100644
GIT binary patch
delta 150
zcmaE+HC>y_CDB8f9ft_J;CX=$Sp_@;DBS*ZWOArG`yqk-skb(dM7b}-PmjNT!dM+j|
nRUj7PlIHr)mBaOoi=T@Fq{>xLoJ)j}0TnPYFl_E*x+??#m+>Bm

delta 103
zcmbQP{Y;C?CD$H_waP#alF9JAR-YT9OB4O08*5!xDytEK9ifCwnm|Z{Evv
GO9%jy

[Qemu-devel] [PULL 15/16] i386: populate floppy drive information in DSDT

2016-03-03 Thread Michael S. Tsirkin

From: Roman Kagan 

On x86-based systems Linux determines the presence and the type of
floppy drives via a query of a CMOS field.  So does SeaBIOS when
populating the return data for int 0x13 function 0x08.

However Windows doesn't do it. Instead, it requests this information
from BIOS via int 0x13/0x08 or through ACPI objects _FDE (Floppy Drive
Enumerate) and _FDI (Floppy Drive Information) of the floppy controller
object.  On UEFI systems only ACPI-based detection is supported.

QEMU doesn't provide those objects in its ACPI tables and as a result
floppy drives are invisible to Windows on UEFI/OVMF.

This patch adds those objects to the floppy controller in DSDT,
populating them with the information from respective QEMU objects.

Signed-off-by: Roman Kagan 
Cc: Igor Mammedov 
Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Cc: John Snow 
Cc: Laszlo Ersek 
Cc: Kevin O'Connor 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/acpi-build.c | 69 +---
 1 file changed, 66 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1560c75..db4ede1 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -37,6 +37,7 @@
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/loader.h"
 #include "hw/isa/isa.h"
+#include "hw/block/fdc.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/mem/nvdimm.h"
 #include "sysemu/tpm.h"
@@ -1228,11 +1229,60 @@ static void build_hpet_aml(Aml *table)
 aml_append(table, scope);
 }
 
-static Aml *build_fdc_device_aml(void)
+static Aml *build_fdinfo_aml(int idx, FloppyDriveType type)
 {
+Aml *dev, *fdi;
+uint8_t maxc, maxh, maxs;
+
+isa_fdc_get_drive_max_chs(type, , , );
+
+dev = aml_device("FLP%c", 'A' + idx);
+
+aml_append(dev, aml_name_decl("_ADR", aml_int(idx)));
+
+fdi = aml_package(16);
+aml_append(fdi, aml_int(idx));  /* Drive Number */
+aml_append(fdi,
+aml_int(cmos_get_fd_drive_type(type)));  /* Device Type */
+/*
+ * the values below are the limits of the drive, and are thus independent
+ * of the inserted media
+ */
+aml_append(fdi, aml_int(maxc));  /* Maximum Cylinder Number */
+aml_append(fdi, aml_int(maxs));  /* Maximum Sector Number */
+aml_append(fdi, aml_int(maxh));  /* Maximum Head Number */
+/*
+ * SeaBIOS returns the below values for int 0x13 func 0x08 regardless of
+ * the drive type, so shall we
+ */
+aml_append(fdi, aml_int(0xAF));  /* disk_specify_1 */
+aml_append(fdi, aml_int(0x02));  /* disk_specify_2 */
+aml_append(fdi, aml_int(0x25));  /* disk_motor_wait */
+aml_append(fdi, aml_int(0x02));  /* disk_sector_siz */
+aml_append(fdi, aml_int(0x12));  /* disk_eot */
+aml_append(fdi, aml_int(0x1B));  /* disk_rw_gap */
+aml_append(fdi, aml_int(0xFF));  /* disk_dtl */
+aml_append(fdi, aml_int(0x6C));  /* disk_formt_gap */
+aml_append(fdi, aml_int(0xF6));  /* disk_fill */
+aml_append(fdi, aml_int(0x0F));  /* disk_head_sttl */
+aml_append(fdi, aml_int(0x08));  /* disk_motor_strt */
+
+aml_append(dev, aml_name_decl("_FDI", fdi));
+return dev;
+}
+
+static Aml *build_fdc_device_aml(ISADevice *fdc)
+{
+int i;
 Aml *dev;
 Aml *crs;
 
+#define ACPI_FDE_MAX_FD 4
+uint32_t fde_buf[5] = {
+0, 0, 0, 0, /* presence of floppy drives #0 - #3 */
+cpu_to_le32(2)  /* tape presence (2 == never present) */
+};
+
 dev = aml_device("FDC0");
 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0700")));
 
@@ -1244,6 +1294,17 @@ static Aml *build_fdc_device_aml(void)
 aml_dma(AML_COMPATIBILITY, AML_NOTBUSMASTER, AML_TRANSFER8, 2));
 aml_append(dev, aml_name_decl("_CRS", crs));
 
+for (i = 0; i < MIN(MAX_FD, ACPI_FDE_MAX_FD); i++) {
+FloppyDriveType type = isa_fdc_get_drive_type(fdc, i);
+
+if (type < FLOPPY_DRIVE_TYPE_NONE) {
+fde_buf[i] = cpu_to_le32(1);  /* drive present */
+aml_append(dev, build_fdinfo_aml(i, type));
+}
+}
+aml_append(dev, aml_name_decl("_FDE",
+   aml_buffer(sizeof(fde_buf), (uint8_t *)fde_buf)));
+
 return dev;
 }
 
@@ -1388,13 +1449,15 @@ static Aml *build_com_device_aml(uint8_t uid)
 
 static void build_isa_devices_aml(Aml *table)
 {
+ISADevice *fdc = pc_find_fdc0();
+
 Aml *scope = aml_scope("_SB.PCI0.ISA");
 
 aml_append(scope, build_rtc_device_aml());
 aml_append(scope, build_kbd_device_aml());
 aml_append(scope, build_mouse_device_aml());
-if (pc_find_fdc0()) {
-aml_append(scope, build_fdc_device_aml());
+if (fdc) {
+aml_append(scope, build_fdc_device_aml(fdc));
 }
 aml_append(scope, build_lpt_device_aml());

[Qemu-devel] [PULL 07/16] hw/virtio: group virtio flags into an enum

2016-03-03 Thread Michael S. Tsirkin

From: Marcel Apfelbaum 

Minimizes the possibility to assign
the same bit to different features.

Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Laurent Vivier 
Acked-by: Jason Wang 
---
 hw/virtio/virtio-pci.h | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 6686b10..e4548c2 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -58,30 +58,33 @@ typedef struct VirtioBusClass VirtioPCIBusClass;
 #define VIRTIO_PCI_BUS_CLASS(klass) \
 OBJECT_CLASS_CHECK(VirtioPCIBusClass, klass, TYPE_VIRTIO_PCI_BUS)
 
+enum {
+VIRTIO_PCI_FLAG_BUS_MASTER_BUG_MIGRATION_BIT,
+VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT,
+VIRTIO_PCI_FLAG_DISABLE_LEGACY_BIT,
+VIRTIO_PCI_FLAG_DISABLE_MODERN_BIT,
+VIRTIO_PCI_FLAG_MIGRATE_EXTRA_BIT,
+VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT,
+VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT,
+};
+
 /* Need to activate work-arounds for buggy guests at vmstate load. */
-#define VIRTIO_PCI_FLAG_BUS_MASTER_BUG_MIGRATION_BIT  0
 #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG_MIGRATION \
 (1 << VIRTIO_PCI_FLAG_BUS_MASTER_BUG_MIGRATION_BIT)
 
 /* Performance improves when virtqueue kick processing is decoupled from the
  * vcpu thread using ioeventfd for some devices. */
-#define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1
 #define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1 << 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT)
 
 /* virtio version flags */
-#define VIRTIO_PCI_FLAG_DISABLE_LEGACY_BIT 2
-#define VIRTIO_PCI_FLAG_DISABLE_MODERN_BIT 3
-#define VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT 6
 #define VIRTIO_PCI_FLAG_DISABLE_LEGACY (1 << 
VIRTIO_PCI_FLAG_DISABLE_LEGACY_BIT)
 #define VIRTIO_PCI_FLAG_DISABLE_MODERN (1 << 
VIRTIO_PCI_FLAG_DISABLE_MODERN_BIT)
 #define VIRTIO_PCI_FLAG_DISABLE_PCIE (1 << VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT)
 
 /* migrate extra state */
-#define VIRTIO_PCI_FLAG_MIGRATE_EXTRA_BIT 4
 #define VIRTIO_PCI_FLAG_MIGRATE_EXTRA (1 << VIRTIO_PCI_FLAG_MIGRATE_EXTRA_BIT)
 
 /* have pio notification for modern device ? */
-#define VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT 5
 #define VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY \
 (1 << VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT)
 
-- 
MST

[Qemu-devel] [PULL 12/16] i386/acpi: make floppy controller object dynamic

2016-03-03 Thread Michael S. Tsirkin

From: Roman Kagan 

Instead of statically declaring the floppy controller in DSDT, with its
_STA method depending on some obscure bit in the parent ISA bridge, add
the object dynamically to DSDT via AML API only when the controller is
present.

The _STA method is no longer necessary and is therefore dropped.  So are
the declarations of the fields indicating whether the contoller is
enabled.

Signed-off-by: Roman Kagan 
Signed-off-by: Igor Mammedov 
Reviewed-by: Marcel Apfelbaum 
Cc: "Michael S. Tsirkin" 
Cc: John Snow 
Cc: Laszlo Ersek 
Cc: Kevin O'Connor 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/acpi-build.c | 27 +++
 1 file changed, 3 insertions(+), 24 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 0fc83e8..1560c75 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1232,29 +1232,10 @@ static Aml *build_fdc_device_aml(void)
 {
 Aml *dev;
 Aml *crs;
-Aml *method;
-Aml *if_ctx;
-Aml *else_ctx;
-Aml *zero = aml_int(0);
-Aml *is_present = aml_local(0);
 
 dev = aml_device("FDC0");
 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0700")));
 
-method = aml_method("_STA", 0, AML_NOTSERIALIZED);
-aml_append(method, aml_store(aml_name("FDEN"), is_present));
-if_ctx = aml_if(aml_equal(is_present, zero));
-{
-aml_append(if_ctx, aml_return(aml_int(0x00)));
-}
-aml_append(method, if_ctx);
-else_ctx = aml_else();
-{
-aml_append(else_ctx, aml_return(aml_int(0x0f)));
-}
-aml_append(method, else_ctx);
-aml_append(dev, method);
-
 crs = aml_resource_template();
 aml_append(crs, aml_io(AML_DECODE16, 0x03F2, 0x03F2, 0x00, 0x04));
 aml_append(crs, aml_io(AML_DECODE16, 0x03F7, 0x03F7, 0x00, 0x01));
@@ -1412,7 +1393,9 @@ static void build_isa_devices_aml(Aml *table)
 aml_append(scope, build_rtc_device_aml());
 aml_append(scope, build_kbd_device_aml());
 aml_append(scope, build_mouse_device_aml());
-aml_append(scope, build_fdc_device_aml());
+if (pc_find_fdc0()) {
+aml_append(scope, build_fdc_device_aml());
+}
 aml_append(scope, build_lpt_device_aml());
 aml_append(scope, build_com_device_aml(1));
 aml_append(scope, build_com_device_aml(2));
@@ -1781,8 +1764,6 @@ static void build_q35_isa_bridge(Aml *table)
 aml_append(field, aml_named_field("COMB", 3));
 aml_append(field, aml_reserved_field(1));
 aml_append(field, aml_named_field("LPTD", 2));
-aml_append(field, aml_reserved_field(2));
-aml_append(field, aml_named_field("FDCD", 2));
 aml_append(dev, field);
 
 aml_append(dev, aml_operation_region("LPCE", AML_PCI_CONFIG,
@@ -1792,7 +1773,6 @@ static void build_q35_isa_bridge(Aml *table)
 aml_append(field, aml_named_field("CAEN", 1));
 aml_append(field, aml_named_field("CBEN", 1));
 aml_append(field, aml_named_field("LPEN", 1));
-aml_append(field, aml_named_field("FDEN", 1));
 aml_append(dev, field);
 
 aml_append(scope, dev);
@@ -1840,7 +1820,6 @@ static void build_piix4_isa_bridge(Aml *table)
 aml_append(field, aml_reserved_field(3));
 aml_append(field, aml_named_field("CBEN", 1));
 aml_append(dev, field);
-aml_append(dev, aml_name_decl("FDEN", aml_int(1)));
 
 aml_append(scope, dev);
 aml_append(table, scope);
-- 
MST

[Qemu-devel] [PULL 13/16] i386: expose floppy drive CMOS type

2016-03-03 Thread Michael S. Tsirkin

From: Roman Kagan 

Make it possible to query the CMOS type of a floppy drive outside of the
source file where it's defined.

It will allow to properly populate the corresponding ACPI objects and
thus enable Windows on BIOS-less systems to access the floppy drives.

Signed-off-by: Roman Kagan 
Cc: Igor Mammedov 
Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Cc: John Snow 
Cc: Laszlo Ersek 
Cc: Kevin O'Connor 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/i386/pc.h | 1 +
 hw/i386/pc.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 8b3546e..472754c 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -265,6 +265,7 @@ typedef void (*cpu_set_smm_t)(int smm, void *arg);
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
 
 ISADevice *pc_find_fdc0(void);
+int cmos_get_fd_drive_type(FloppyDriveType fd0);
 
 /* acpi_piix.c */
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 0aeefd2..d7fea61 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -200,7 +200,7 @@ static void pic_irq_request(void *opaque, int irq, int 
level)
 
 #define REG_EQUIPMENT_BYTE  0x14
 
-static int cmos_get_fd_drive_type(FloppyDriveType fd0)
+int cmos_get_fd_drive_type(FloppyDriveType fd0)
 {
 int val;
 
-- 
MST

[Qemu-devel] [PULL 03/16] acpi: allow using object as offset for OperationRegion

2016-03-03 Thread Michael S. Tsirkin

From: Xiao Guangrong 

Extend aml_operation_region() to use object as offset

Reviewed-by: Igor Mammedov 
Signed-off-by: Xiao Guangrong 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/aml-build.h |  2 +-
 hw/acpi/aml-build.c |  4 ++--
 hw/i386/acpi-build.c| 31 ---
 3 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 258cbf3..b16017e 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -285,7 +285,7 @@ Aml *aml_interrupt(AmlConsumerAndProducer con_and_pro,
 Aml *aml_io(AmlIODecode dec, uint16_t min_base, uint16_t max_base,
 uint8_t aln, uint8_t len);
 Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
-  uint32_t offset, uint32_t len);
+  Aml *offset, uint32_t len);
 Aml *aml_irq_no_flags(uint8_t irq);
 Aml *aml_named_field(const char *name, unsigned length);
 Aml *aml_reserved_field(unsigned length);
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index bb0cf52..f26fa26 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -942,14 +942,14 @@ Aml *aml_package(uint8_t num_elements)
 
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefOpRegion */
 Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
-  uint32_t offset, uint32_t len)
+  Aml *offset, uint32_t len)
 {
 Aml *var = aml_alloc();
 build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
 build_append_byte(var->buf, 0x80); /* OpRegionOp */
 build_append_namestring(var->buf, "%s", name);
 build_append_byte(var->buf, rs);
-build_append_int(var->buf, offset);
+aml_append(var, offset);
 build_append_int(var->buf, len);
 return var;
 }
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 52c9470..0fc83e8 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -993,7 +993,7 @@ static void build_processor_devices(Aml *sb_scope, unsigned 
acpi_cpus,
 aml_append(sb_scope, dev);
 /* declare CPU hotplug MMIO region and PRS field to access it */
 aml_append(sb_scope, aml_operation_region(
-"PRST", AML_SYSTEM_IO, pm->cpu_hp_io_base, pm->cpu_hp_io_len));
+"PRST", AML_SYSTEM_IO, aml_int(pm->cpu_hp_io_base), 
pm->cpu_hp_io_len));
 field = aml_field("PRST", AML_BYTE_ACC, AML_NOLOCK, AML_PRESERVE);
 aml_append(field, aml_named_field("PRS", 256));
 aml_append(sb_scope, field);
@@ -1078,7 +1078,7 @@ static void build_memory_devices(Aml *sb_scope, int 
nr_mem,
 
 aml_append(scope, aml_operation_region(
 MEMORY_HOTPLUG_IO_REGION, AML_SYSTEM_IO,
-io_base, io_len)
+aml_int(io_base), io_len)
 );
 
 field = aml_field(MEMORY_HOTPLUG_IO_REGION, AML_DWORD_ACC,
@@ -1192,7 +1192,8 @@ static void build_hpet_aml(Aml *table)
 aml_append(dev, aml_name_decl("_UID", zero));
 
 aml_append(dev,
-aml_operation_region("HPTM", AML_SYSTEM_MEMORY, HPET_BASE, HPET_LEN));
+aml_operation_region("HPTM", AML_SYSTEM_MEMORY, aml_int(HPET_BASE),
+ HPET_LEN));
 field = aml_field("HPTM", AML_DWORD_ACC, AML_LOCK, AML_PRESERVE);
 aml_append(field, aml_named_field("VEND", 32));
 aml_append(field, aml_named_field("PRD", 32));
@@ -1430,7 +1431,7 @@ static void build_dbg_aml(Aml *table)
 Aml *idx = aml_local(2);
 
 aml_append(scope,
-   aml_operation_region("DBG", AML_SYSTEM_IO, 0x0402, 0x01));
+   aml_operation_region("DBG", AML_SYSTEM_IO, aml_int(0x0402), 0x01));
 field = aml_field("DBG", AML_BYTE_ACC, AML_NOLOCK, AML_PRESERVE);
 aml_append(field, aml_named_field("DBGB", 8));
 aml_append(scope, field);
@@ -1770,10 +1771,10 @@ static void build_q35_isa_bridge(Aml *table)
 
 /* ICH9 PCI to ISA irq remapping */
 aml_append(dev, aml_operation_region("PIRQ", AML_PCI_CONFIG,
- 0x60, 0x0C));
+ aml_int(0x60), 0x0C));
 
 aml_append(dev, aml_operation_region("LPCD", AML_PCI_CONFIG,
- 0x80, 0x02));
+ aml_int(0x80), 0x02));
 field = aml_field("LPCD", AML_ANY_ACC, AML_NOLOCK, AML_PRESERVE);
 aml_append(field, aml_named_field("COMA", 3));
 aml_append(field, aml_reserved_field(1));
@@ -1785,7 +1786,7 @@ static void build_q35_isa_bridge(Aml *table)
 aml_append(dev, field);
 
 aml_append(dev, aml_operation_region("LPCE", AML_PCI_CONFIG,
- 0x82, 0x02));
+ aml_int(0x82), 0x02));
 /* enable bits */
 field = aml_field("LPCE", AML_ANY_ACC, AML_NOLOCK, AML_PRESERVE);
 aml_append(field,

[Qemu-devel] [PULL 11/16] pc-dimm: fix error handling in pc_dimm_check_memdev_is_busy()

2016-03-03 Thread Michael S. Tsirkin

From: Igor Mammedov 

If host_memory_backend_get_memory() were to return error and
NULL MemoryRegion, pc_dimm_check_memdev_is_busy() would crash
dereferencing NULL pointer in memory_region_is_mapped().
But if error is set and non NULL MemoryRegion is returned
then error_setg() will fail with "error already set" assertion
in error_setv()

To avoid above issues use typical error handling pattern
for property setters:

Error *local_error = NULL;
...
error_propagate(errp, local_err);

Reported-by: Markus Armbruster 
Signed-off-by: Igor Mammedov 
Reviewed-by: Markus Armbruster 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/mem/pc-dimm.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 650f0f8..973bf20 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -364,15 +364,22 @@ static void pc_dimm_check_memdev_is_busy(Object *obj, 
const char *name,
   Object *val, Error **errp)
 {
 MemoryRegion *mr;
+Error *local_err = NULL;
 
-mr = host_memory_backend_get_memory(MEMORY_BACKEND(val), errp);
+mr = host_memory_backend_get_memory(MEMORY_BACKEND(val), _err);
+if (local_err) {
+goto out;
+}
 if (memory_region_is_mapped(mr)) {
 char *path = object_get_canonical_path_component(val);
-error_setg(errp, "can't use already busy memdev: %s", path);
+error_setg(_err, "can't use already busy memdev: %s", path);
 g_free(path);
 } else {
-qdev_prop_allow_set_link_before_realize(obj, name, val, errp);
+qdev_prop_allow_set_link_before_realize(obj, name, val, _err);
 }
+
+out:
+error_propagate(errp, local_err);
 }
 
 static void pc_dimm_init(Object *obj)
-- 
MST

[Qemu-devel] [PULL 05/16] balloon: fix segfault and harden the stats queue

2016-03-03 Thread Michael S. Tsirkin

From: Ladi Prosek 

The segfault here is triggered by the driver notifying the stats queue
twice after adding a buffer to it. This effectively resets stats_vq_elem
back to NULL and QEMU crashes on the next stats timer tick in
balloon_stats_poll_cb.

This is a regression introduced in 51b19ebe4320f3dc, although admittedly
the device assumed too much about the stats queue protocol even before
that commit. This commit adds a few more checks and ensures that the one
stats buffer gets deallocated on device reset.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Ladi Prosek 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-balloon.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e9c30e9..e97d403 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -101,7 +101,7 @@ static void balloon_stats_poll_cb(void *opaque)
 VirtIOBalloon *s = opaque;
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (!balloon_stats_supported(s)) {
+if (s->stats_vq_elem == NULL || !balloon_stats_supported(s)) {
 /* re-schedule */
 balloon_stats_change_timer(s, s->stats_poll_interval);
 return;
@@ -258,11 +258,20 @@ static void virtio_balloon_receive_stats(VirtIODevice 
*vdev, VirtQueue *vq)
 size_t offset = 0;
 qemu_timeval tv;
 
-s->stats_vq_elem = elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
 if (!elem) {
 goto out;
 }
 
+if (s->stats_vq_elem != NULL) {
+/* This should never happen if the driver follows the spec. */
+virtqueue_push(vq, s->stats_vq_elem, 0);
+virtio_notify(vdev, vq);
+g_free(s->stats_vq_elem);
+}
+
+s->stats_vq_elem = elem;
+
 /* Initialize the stats to get rid of any stale values.  This is only
  * needed to handle the case where a guest supports fewer stats than it
  * used to (ie. it has booted into an old kernel).
@@ -458,6 +467,16 @@ static void virtio_balloon_device_unrealize(DeviceState 
*dev, Error **errp)
 virtio_cleanup(vdev);
 }
 
+static void virtio_balloon_device_reset(VirtIODevice *vdev)
+{
+VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+
+if (s->stats_vq_elem != NULL) {
+g_free(s->stats_vq_elem);
+s->stats_vq_elem = NULL;
+}
+}
+
 static void virtio_balloon_instance_init(Object *obj)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(obj);
@@ -486,6 +505,7 @@ static void virtio_balloon_class_init(ObjectClass *klass, 
void *data)
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
 vdc->realize = virtio_balloon_device_realize;
 vdc->unrealize = virtio_balloon_device_unrealize;
+vdc->reset = virtio_balloon_device_reset;
 vdc->get_config = virtio_balloon_get_config;
 vdc->set_config = virtio_balloon_set_config;
 vdc->get_features = virtio_balloon_get_features;
-- 
MST

[Qemu-devel] [PULL 02/16] acpi: add aml_concatenate()

2016-03-03 Thread Michael S. Tsirkin

From: Xiao Guangrong 

It will be used by nvdimm acpi

Signed-off-by: Xiao Guangrong 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/aml-build.h | 1 +
 hw/acpi/aml-build.c | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 7d26911..258cbf3 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -353,6 +353,7 @@ Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
+Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target);
 
 void
 build_header(GArray *linker, GArray *table_data,
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 45b7f0a..bb0cf52 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1437,6 +1437,13 @@ Aml *aml_alias(const char *source_object, const char 
*alias_object)
 return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefConcat */
+Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target)
+{
+return build_opcode_2arg_dst(0x73 /* ConcatOp */, source1, source2,
+ target);
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev,
-- 
MST

[Qemu-devel] [PULL 08/16] virtio-balloon: export all balloon statistics

2016-03-03 Thread Michael S. Tsirkin

From: Igor Redko 

We are making experiments with different autoballooning strategies
based on the guest behavior. Thus we need to experiment with different
guest statistics. For now every counter change requires QEMU recompilation
and dances with Libvirt.

This patch introduces transport for unrecognized counters in virtio-balloon.
This transport can be used for measuring benefits from using new
balloon counters, before submitting any patches. Current alternative
is 'guest-exec' transport which isn't made for such delicate matters
and can influence test results.

Originally all counters with tag >= VIRTIO_BALLOON_S_NR were ignored.
Instead of this we keep first (VIRTIO_BALLOON_S_NR + 32) counters from the
queue and pass unrecognized ones with the following names: 'x-stat-',
where  is a tag number in hex. Defined counters are reported with their
regular names.

Signed-off-by: Igor Redko 
Signed-off-by: Denis V. Lunev 
CC: Michael S. Tsirkin 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 configure  | 12 
 include/hw/virtio/virtio-balloon.h |  3 ++-
 hw/virtio/virtio-balloon.c | 32 ++--
 3 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/configure b/configure
index 0c0472a..767d96e 100755
--- a/configure
+++ b/configure
@@ -315,6 +315,7 @@ vhdx=""
 numa=""
 tcmalloc="no"
 jemalloc="no"
+unknown_balloon_stats="no"
 
 # parse CC options first
 for opt do
@@ -1142,6 +1143,10 @@ for opt do
   ;;
   --enable-jemalloc) jemalloc="yes"
   ;;
+  --enable-unknown-balloon-stats) unknown_balloon_stats="yes"
+  ;;
+  --disable-unknown-balloon-stats) unknown_balloon_stats="no"
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1364,6 +1369,8 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   numalibnuma support
   tcmalloctcmalloc support
   jemallocjemalloc support
+  unknown-balloon-stats  report unknown balloon statistics counters
+  ;;
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -4790,6 +4797,7 @@ echo "bzip2 support $bzip2"
 echo "NUMA host support $numa"
 echo "tcmalloc support  $tcmalloc"
 echo "jemalloc support  $jemalloc"
+echo "unknown balloon stat counters support  $unknown_balloon_stats"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -5342,6 +5350,10 @@ if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
 
+if test "$unknown_balloon_stats" = "yes" ; then
+  echo "CONFIG_UNKNOWN_BALLOON_STATS=y" >> $config_host_mak
+fi
+
 # Hold two types of flag:
 #   CONFIG_THREAD_SETNAME_BYTHREAD  - we've got a way of setting the name on
 # a thread we have a handle to
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 35f62ac..5c8730e 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -36,7 +36,8 @@ typedef struct VirtIOBalloon {
 VirtQueue *ivq, *dvq, *svq;
 uint32_t num_pages;
 uint32_t actual;
-uint64_t stats[VIRTIO_BALLOON_S_NR];
+VirtIOBalloonStatModern stats[VIRTIO_BALLOON_S_NR + 32];
+uint16_t stats_cnt;
 VirtQueueElement *stats_vq_elem;
 size_t stats_vq_offset;
 QEMUTimer *stats_timer;
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e97d403..64367ac 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -66,8 +66,7 @@ static const char *balloon_stat_names[] = {
  */
 static inline void reset_stats(VirtIOBalloon *dev)
 {
-int i;
-for (i = 0; i < VIRTIO_BALLOON_S_NR; dev->stats[i++] = -1);
+dev->stats_cnt = 0;
 }
 
 static bool balloon_stats_supported(const VirtIOBalloon *s)
@@ -133,12 +132,22 @@ static void balloon_stats_get_all(Object *obj, Visitor 
*v, const char *name,
 if (err) {
 goto out_end;
 }
-for (i = 0; i < VIRTIO_BALLOON_S_NR; i++) {
-visit_type_uint64(v, balloon_stat_names[i], >stats[i], );
+for (i = 0; i < s->stats_cnt; i++) {
+if (s->stats[i].tag < VIRTIO_BALLOON_S_NR) {
+visit_type_uint64(v, balloon_stat_names[s->stats[i].tag],
+  >stats[i].val, );
+} else {
+#if defined(CONFIG_UNKNOWN_BALLOON_STATS)
+gchar *str = g_strdup_printf("x-stat-%04x", s->stats[i].tag);
+visit_type_uint64(v, str, >stats[i].val, );
+g_free(str);
+#endif
+}
 if (err) {
 break;
 }
 }
+
 error_propagate(errp, err);
 err = NULL;
 visit_end_struct(v, );
@@ -282,10 +291,21 @@ static void virtio_balloon_receive_stats(VirtIODevice 
*vdev, VirtQueue *vq)
==

[Qemu-devel] [PULL 06/16] hw/virtio: fix double use of a virtio flag

2016-03-03 Thread Michael S. Tsirkin

From: Marcel Apfelbaum 

Commits 1811e64c and a6df8adf use the same virtio feature bit 4
for different features.

Fix it by using different bits.

Reported-by: Laurent Vivier 
Tested-by: Laurent Vivier 
Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Acked-by: Jason Wang 
---
 hw/virtio/virtio-pci.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index e096e98..6686b10 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -71,7 +71,7 @@ typedef struct VirtioBusClass VirtioPCIBusClass;
 /* virtio version flags */
 #define VIRTIO_PCI_FLAG_DISABLE_LEGACY_BIT 2
 #define VIRTIO_PCI_FLAG_DISABLE_MODERN_BIT 3
-#define VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT 4
+#define VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT 6
 #define VIRTIO_PCI_FLAG_DISABLE_LEGACY (1 << 
VIRTIO_PCI_FLAG_DISABLE_LEGACY_BIT)
 #define VIRTIO_PCI_FLAG_DISABLE_MODERN (1 << 
VIRTIO_PCI_FLAG_DISABLE_MODERN_BIT)
 #define VIRTIO_PCI_FLAG_DISABLE_PCIE (1 << VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT)
-- 
MST

[Qemu-devel] [PULL 04/16] acpi: add build_append_named_dword, returning an offset in buffer

2016-03-03 Thread Michael S. Tsirkin

This is a very limited form of support for runtime patching -
similar in functionality to what we can do with ACPI_EXTRACT
macros in python, but implemented in C.

This is to allow ACPI code direct access to data tables -
which is exactly what DataTableRegion is there for, except
no known windows release so far implements DataTableRegion.

Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Xiao Guangrong 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/aml-build.h |  3 +++
 hw/acpi/aml-build.c | 28 
 2 files changed, 31 insertions(+)

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index b16017e..66f48ec 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -368,4 +368,7 @@ void
 build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets,
const char *oem_id, const char *oem_table_id);
 
+int
+build_append_named_dword(GArray *array, const char *name_format, ...);
+
 #endif
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index f26fa26..ab89ca6 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -258,6 +258,34 @@ static void build_append_int(GArray *table, uint64_t value)
 }
 }
 
+/*
+ * Build NAME(, 0x) where 0x is encoded as a dword,
+ * and return the offset to 0x for runtime patching.
+ *
+ * Warning: runtime patching is best avoided. Only use this as
+ * a replacement for DataTableRegion (for guests that don't
+ * support it).
+ */
+int
+build_append_named_dword(GArray *array, const char *name_format, ...)
+{
+int offset;
+va_list ap;
+
+build_append_byte(array, 0x08); /* NameOp */
+va_start(ap, name_format);
+build_append_namestringv(array, name_format, ap);
+va_end(ap);
+
+build_append_byte(array, 0x0C); /* DWordPrefix */
+
+offset = array->len;
+build_append_int_noprefix(array, 0x, 4);
+assert(array->len == offset + 4);
+
+return offset;
+}
+
 static GPtrArray *alloc_list;
 
 static Aml *aml_alloc(void)
-- 
MST

[Qemu-devel] [PULL 00/16] vhost, virtio, pci, pc, acpi

2016-03-03 Thread Michael S. Tsirkin

The following changes since commit ed6128ebbdd7cd885d39980659dad4b5c8ae8158:

  Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' 
into staging (2016-03-01 15:54:03 +)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to aa5976e7afb9407f64ec0bae1f724d0d21bff959:

  i386: update expected DSDT (2016-03-02 18:55:35 +0200)


vhost, virtio, pci, pc, acpi

Fixes all over the place.
Floppy device ACPI rewrite.
Beginning of the nvdimm work.
New statistics for virtio-balloon.

Signed-off-by: Michael S. Tsirkin 


Denis V. Lunev (1):
  virtio-balloon: add 'available' counter

Igor Mammedov (1):
  pc-dimm: fix error handling in pc_dimm_check_memdev_is_busy()

Igor Redko (1):
  virtio-balloon: export all balloon statistics

Ilya Maximets (1):
  vhost-user: verify that number of queues is less than MAX_QUEUE_NUM

Ladi Prosek (1):
  balloon: fix segfault and harden the stats queue

Marcel Apfelbaum (2):
  hw/virtio: fix double use of a virtio flag
  hw/virtio: group virtio flags into an enum

Michael S. Tsirkin (2):
  acpi: add build_append_named_dword, returning an offset in buffer
  i386: update expected DSDT

Roman Kagan (4):
  i386/acpi: make floppy controller object dynamic
  i386: expose floppy drive CMOS type
  fdc: add function to determine drive chs limits
  i386: populate floppy drive information in DSDT

Xiao Guangrong (3):
  acpi: add aml_create_field()
  acpi: add aml_concatenate()
  acpi: allow using object as offset for OperationRegion

 configure   |  12 +++
 hw/virtio/virtio-pci.h  |  17 ++--
 include/hw/acpi/aml-build.h |   8 +-
 include/hw/block/fdc.h  |   2 +
 include/hw/i386/pc.h|   1 +
 include/hw/virtio/virtio-balloon.h  |   3 +-
 include/standard-headers/linux/virtio_balloon.h |   3 +-
 hw/acpi/aml-build.c |  53 +-
 hw/block/fdc.c  |  23 +
 hw/i386/acpi-build.c| 123 
 hw/i386/pc.c|   2 +-
 hw/mem/pc-dimm.c|  13 ++-
 hw/virtio/virtio-balloon.c  |  57 +--
 net/vhost-user.c|   5 +-
 tests/acpi-test-data/pc/DSDT| Bin 5478 -> 5527 bytes
 tests/acpi-test-data/pc/DSDT.bridge | Bin 7337 -> 7386 bytes
 tests/acpi-test-data/q35/DSDT   | Bin 8321 -> 8233 bytes
 tests/acpi-test-data/q35/DSDT.bridge| Bin 8338 -> 8250 bytes
 18 files changed, 256 insertions(+), 66 deletions(-)

[Qemu-devel] [PULL 01/16] acpi: add aml_create_field()

2016-03-03 Thread Michael S. Tsirkin

From: Xiao Guangrong 

It will be used by nvdimm acpi

Signed-off-by: Xiao Guangrong 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/aml-build.h |  2 ++
 hw/acpi/aml-build.c | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index d3e0c8f..7d26911 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -344,6 +344,8 @@ Aml *aml_mutex(const char *name, uint8_t sync_level);
 Aml *aml_acquire(Aml *mutex, uint16_t timeout);
 Aml *aml_release(Aml *mutex);
 Aml *aml_alias(const char *source_object, const char *alias_object);
+Aml *aml_create_field(Aml *srcbuf, Aml *bit_index, Aml *num_bits,
+  const char *name);
 Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const char *name);
 Aml *aml_create_qword_field(Aml *srcbuf, Aml *index, const char *name);
 Aml *aml_varpackage(uint32_t num_elements);
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 6675535..45b7f0a 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -997,6 +997,20 @@ Aml *create_field_common(int opcode, Aml *srcbuf, Aml 
*index, const char *name)
 return var;
 }
 
+/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateField */
+Aml *aml_create_field(Aml *srcbuf, Aml *bit_index, Aml *num_bits,
+  const char *name)
+{
+Aml *var = aml_alloc();
+build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
+build_append_byte(var->buf, 0x13); /* CreateFieldOp */
+aml_append(var, srcbuf);
+aml_append(var, bit_index);
+aml_append(var, num_bits);
+build_append_namestring(var->buf, "%s", name);
+return var;
+}
+
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateDWordField */
 Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const char *name)
 {
-- 
MST

Re: [Qemu-devel] [PATCH v8 7/7] s390x/cpu: Allow hotplug of CPUs

2016-03-03 Thread David Hildenbrand

> Implement cpu hotplug routine and add the machine hook.
> 
> Signed-off-by: Matthew Rosato 
> Reviewed-by: David Hildenbrand 
> ---
>  hw/s390x/s390-virtio-ccw.c | 13 +
>  target-s390x/cpu.c |  7 +++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 7fc1879..174a2f8 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -186,6 +186,18 @@ static HotplugHandler 
> *s390_get_hotplug_handler(MachineState *machine,
>  return NULL;
>  }
> 
> +static void s390_hot_add_cpu(const int64_t id, Error **errp)
> +{
> +MachineState *machine = MACHINE(qdev_get_machine());
> +Error *err = NULL;
> +
> +s390_new_cpu(machine->cpu_model, id, );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}

You could unconditionally call error_propagate(errp, err); here

> +}
> +


Still looks good to me!

David

Re: [Qemu-devel] [PATCH v8 3/7] s390x/cpu: Get rid of side effects when creating a vcpu

2016-03-03 Thread David Hildenbrand

> In preparation for hotplug, defer some CPU initialization
> until the device is actually being realized, including
> cpu_exec_init.
> 
> Signed-off-by: Matthew Rosato 

Looks good to me!

Reviewed-by: David Hildenbrand 

David

Re: [Qemu-devel] [PATCH 2/4] loader: Add load_image_mr() to load ROM image to a MemoryRegion

2016-03-03 Thread Michael S. Tsirkin

On Thu, Mar 03, 2016 at 05:46:28PM +0100, Paolo Bonzini wrote:
> 
> 
> On 12/02/2016 15:45, Peter Maydell wrote:
> > Add a new function load_image_mr(), which behaves like
> > load_image_targphys() except that it loads the ROM image to
> > a specified MemoryRegion rather than to a specified physical
> > address. This is useful when a ROM blob needs to be loaded
> > to a particular flash or ROM device but the address of that
> > device in the machine's address space is not known. (For
> > instance, ROMs in devices, or ROMs which might exist in
> > a different address space to the system address space.)
> > 
> > Signed-off-by: Peter Maydell 
> 
> The patch looks good, in particular it should be fine for the non-fw_cfg
> uses of rom->mr.
> 
> The fw_cfg interface to loader.c indeed should be turned upside-down so
> that the knowledge moves outside rom_add_file (to a rom_add_fwcfg
> function for example) and rom_add_file doesn't need to call rom_set_mr.
>  Your patch is at least a step in the right direction, because it adds
> memory region support in the !fw_cfg case.
> 
> So,
> 
> Reviewed-by: Paolo Bonzini 
> 
> Thanks,
> 
> Paolo

I agree here.

Reviewed-by: Michael S. Tsirkin

Re: [Qemu-devel] [RFC PATCH v2 3/3] VFIO: Type1 IOMMU mapping support for vGPU

2016-03-03 Thread Neo Jia

On Wed, Mar 02, 2016 at 04:38:34PM +0800, Jike Song wrote:
> On 02/24/2016 12:24 AM, Kirti Wankhede wrote:
> > +   vgpu_dma->size = map->size;
> > +
> > +   vgpu_link_dma(vgpu_iommu, vgpu_dma);
> 
> Hi Kirti & Neo,
> 
> seems that no one actually setup mappings for IOMMU here?
> 

Hi Jike,

Yes.

The actual mapping should be done by the host kernel driver after calling the
translation/pinning API vgpu_dma_do_translate.

Thanks,
Neo

> > 
> 
> --
> Thanks,
> Jike
>

[Qemu-devel] [RFC PATCH v1 10/10] spapr: CPU hot unplug support

2016-03-03 Thread Bharata B Rao

Remove the CPU core device by removing the underlying CPU thread devices.
Hot removal of CPU for sPAPR guests is supported by sending the hot unplug
notification to the guest via EPOW interrupt. Release the vCPU object
after CPU hot unplug so that vCPU fd can be parked and reused.

Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c  | 16 
 hw/ppc/spapr_cpu_core.c | 84 +
 include/hw/ppc/spapr.h  |  1 +
 include/hw/ppc/spapr_cpu_core.h | 11 ++
 4 files changed, 112 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6c4ac50..a42f8c0 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2379,11 +2379,27 @@ static void spapr_machine_device_plug(HotplugHandler 
*hotplug_dev,
 }
 }
 
+void spapr_cpu_destroy(PowerPCCPU *cpu)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+
+xics_cpu_destroy(spapr->icp, cpu);
+qemu_unregister_reset(spapr_cpu_reset, cpu);
+}
+
 static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp)
 {
+sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
+
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
 error_setg(errp, "Memory hot unplug not supported by sPAPR");
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
+if (!smc->dr_cpu_enabled) {
+error_setg(errp, "CPU hot unplug not supported on this machine");
+return;
+}
+spapr_core_unplug(hotplug_dev, dev, errp);
 }
 }
 
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 4c233d7..5156eb3 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -73,6 +73,90 @@ void spapr_core_plug(HotplugHandler *hotplug_dev, 
DeviceState *dev,
 }
 }
 
+static void spapr_cpu_core_cleanup(struct sPAPRCPUUnplugList *unplug_list)
+{
+sPAPRCPUUnplug *unplug, *next;
+Object *cpu;
+
+QLIST_FOREACH_SAFE(unplug, unplug_list, node, next) {
+cpu = unplug->cpu;
+object_unparent(cpu);
+QLIST_REMOVE(unplug, node);
+g_free(unplug);
+}
+}
+
+static void spapr_add_cpu_to_unplug_list(Object *cpu,
+ struct sPAPRCPUUnplugList 
*unplug_list)
+{
+sPAPRCPUUnplug *unplug = g_malloc(sizeof(*unplug));
+
+unplug->cpu = cpu;
+QLIST_INSERT_HEAD(unplug_list, unplug, node);
+}
+
+static int spapr_cpu_release(Object *obj, void *opaque)
+{
+DeviceState *dev = DEVICE(obj);
+CPUState *cs = CPU(dev);
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+struct sPAPRCPUUnplugList *unplug_list = opaque;
+
+spapr_cpu_destroy(cpu);
+cpu_remove_sync(cs);
+
+/*
+ * We are still walking the core object's children list, and
+ * hence can't cleanup this CPU thread object just yet. Put
+ * it on a list for later removal.
+ */
+spapr_add_cpu_to_unplug_list(obj, unplug_list);
+return 0;
+}
+
+static void spapr_core_release(DeviceState *dev, void *opaque)
+{
+struct sPAPRCPUUnplugList unplug_list;
+sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+sPAPRCPUCore *core = SPAPR_CPU_CORE(OBJECT(dev));
+char *slot = object_property_get_str(OBJECT(dev), CPU_CORE_SLOT_PROP,
+ _fatal);
+
+QLIST_INIT(_list);
+object_child_foreach(OBJECT(dev), spapr_cpu_release, _list);
+spapr_cpu_core_cleanup(_list);
+
+/* Unset the link from machine object to this core */
+object_property_set_link(OBJECT(spapr), NULL, slot, NULL);
+g_free(slot);
+
+g_free(core->threads);
+object_unparent(OBJECT(dev));
+}
+
+void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
+   Error **errp)
+{
+sPAPRCPUCore *core = SPAPR_CPU_CORE(OBJECT(dev));
+PowerPCCPU *cpu = >threads[0];
+int id = ppc_get_vcpu_dt_id(cpu);
+sPAPRDRConnector *drc =
+spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
+sPAPRDRConnectorClass *drck;
+Error *local_err = NULL;
+
+g_assert(drc);
+
+drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+drck->detach(drc, dev, spapr_core_release, NULL, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+spapr_hotplug_req_remove_by_index(drc);
+}
+
 static int spapr_cpu_core_realize_child(Object *child, void *opaque)
 {
 Error **errp = opaque;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index d1a0af8..8fab27b 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -593,6 +593,7 @@ void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU 
*cpu, Error **errp);
 void spapr_cpu_reset(void *opaque);
 void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
 int *fdt_offset, sPAPRMachineState *spapr);
+void spapr_cpu_destroy(PowerPCCPU *cpu);
 
 /*

[Qemu-devel] [RFC PATCH v1 06/10] spapr: CPU core device

2016-03-03 Thread Bharata B Rao

Add sPAPR specific CPU core device that is based on generic CPU core device.
Creating this core device will result in creation of all the CPU thread
devices that are part of this core.

Signed-off-by: Bharata B Rao 
---
 hw/ppc/Makefile.objs|   1 +
 hw/ppc/spapr_cpu_core.c | 208 
 include/hw/ppc/spapr_cpu_core.h |  29 ++
 3 files changed, 238 insertions(+)
 create mode 100644 hw/ppc/spapr_cpu_core.c
 create mode 100644 include/hw/ppc/spapr_cpu_core.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c1ffc77..5cc6608 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -4,6 +4,7 @@ obj-y += ppc.o ppc_booke.o
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
+obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
new file mode 100644
index 000..3f3440c
--- /dev/null
+++ b/hw/ppc/spapr_cpu_core.c
@@ -0,0 +1,208 @@
+/*
+ * sPAPR CPU core device, acts as container of CPU thread devices.
+ *
+ * Copyright (C) 2016 Bharata B Rao 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "hw/cpu/core.h"
+#include "hw/ppc/spapr_cpu_core.h"
+#include "hw/ppc/spapr.h"
+#include "hw/boards.h"
+#include "qemu/error-report.h"
+#include "qapi/visitor.h"
+#include 
+
+static int spapr_cpu_core_realize_child(Object *child, void *opaque)
+{
+Error **errp = opaque;
+
+object_property_set_bool(child, true, "realized", errp);
+if (*errp) {
+return 1;
+}
+return 0;
+}
+
+static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
+{
+sPAPRCPUCore *core = SPAPR_CPU_CORE(OBJECT(dev));
+sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+char *slot;
+Error *local_err = NULL;
+
+if (!core->nr_threads) {
+error_setg(errp, "nr_threads property can't be 0");
+return;
+}
+
+if (!core->oc) {
+error_setg(errp, "cpu_model property isn't set");
+return;
+}
+
+/*
+ * TODO: If slot isn't specified, plug this core into
+ * an existing empty slot.
+ */
+slot = object_property_get_str(OBJECT(dev), CPU_CORE_SLOT_PROP, 
_err);
+if (!slot) {
+error_setg(errp, "slot property isn't set");
+return;
+}
+
+object_property_set_link(OBJECT(spapr), OBJECT(core), slot, _err);
+g_free(slot);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+object_child_foreach(OBJECT(dev), spapr_cpu_core_realize_child, errp);
+}
+
+/*
+ * This creates the CPU threads for a given @core.
+ *
+ * In order to create the threads, we need two inputs - number of
+ * threads and the cpu_model. These are set as core object's properties.
+ * When both of them become available/set, this routine will be called from
+ * either property's set handler to create the threads.
+ *
+ * TODO: Dependence of threads creation on two properties is resulting
+ * in this not-so-clean way of creating threads from either of the
+ * property setters based on the order in which they get set. Check if
+ * this can be handled in a better manner.
+ */
+static void spapr_cpu_core_create_threads(sPAPRCPUCore *core, Error **errp)
+{
+int i;
+Error *local_err = NULL;
+
+for (i = 0; i < core->nr_threads; i++) {
+char id[32];
+
+object_initialize(>threads[i], sizeof(core->threads[i]),
+  object_class_get_name(core->oc));
+snprintf(id, sizeof(id), "thread[%d]", i);
+object_property_add_child(OBJECT(core), id, OBJECT(>threads[i]),
+  _err);
+if (local_err) {
+goto err;
+}
+}
+return;
+
+err:
+while (--i) {
+object_unparent(OBJECT(>threads[i]));
+}
+error_propagate(errp, local_err);
+}
+
+static char *spapr_cpu_core_prop_get_cpu_model(Object *obj, Error **errp)
+{
+sPAPRCPUCore *core = SPAPR_CPU_CORE(obj);
+
+return g_strdup(object_class_get_name(core->oc));
+}
+
+static void spapr_cpu_core_prop_set_cpu_model(Object *obj, const char *val,
+  Error **errp)
+{
+sPAPRCPUCore *core = SPAPR_CPU_CORE(obj);
+MachineState *machine = MACHINE(qdev_get_machine());
+ObjectClass *oc = cpu_class_by_name(TYPE_POWERPC_CPU, val);
+ObjectClass *oc_base = cpu_class_by_name(TYPE_POWERPC_CPU,
+ machine->cpu_model);
+if (!oc) {
+error_setg(errp, "Unknown CPU model %s", val);
+return;
+}
+
+/*
+ * Currently

[Qemu-devel] [RFC PATCH v1 07/10] spapr: Represent boot CPUs as spapr-cpu-core devices

2016-03-03 Thread Bharata B Rao

Initialize boot CPUs as spapr-cpu-core devices and create links from
machine object to these core devices. These links can be considered
as CPU slots in which core devices will get hot-plugged. spapr-cpu-core
device's slot property indicates the slot where it is plugged. Information
about all the CPU slots can be obtained by walking these links.

Also prevent topologies that have or can result in incomplete cores.

Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c  | 85 ++---
 hw/ppc/spapr_cpu_core.c |  9 ++
 include/hw/ppc/spapr.h  |  4 +++
 3 files changed, 87 insertions(+), 11 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e9d4abf..5acb612 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -64,6 +64,7 @@
 
 #include "hw/compat.h"
 #include "qemu-common.h"
+#include "hw/ppc/spapr_cpu_core.h"
 
 #include 
 
@@ -1614,8 +1615,11 @@ static void spapr_boot_set(void *opaque, const char 
*boot_device,
 machine->boot_order = g_strdup(boot_device);
 }
 
-static void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu,
-   Error **errp)
+/*
+ * TODO: Check if some of these can be moved to rtas_start_cpu() where
+ * a few other things required for hotplugged CPUs are being done.
+ */
+void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu, Error **errp)
 {
 CPUPPCState *env = >env;
 
@@ -1643,7 +1647,6 @@ static void spapr_cpu_init(sPAPRMachineState *spapr, 
PowerPCCPU *cpu,
 }
 
 xics_cpu_setup(spapr->icp, cpu);
-
 qemu_register_reset(spapr_cpu_reset, cpu);
 }
 
@@ -1720,6 +1723,28 @@ static void spapr_validate_node_memory(MachineState 
*machine, Error **errp)
 }
 }
 
+/*
+ * Check to see if core is being hot-plugged into an already populated slot.
+ */
+static void spapr_cpu_core_allow_set_link(Object *obj, const char *name,
+  Object *val, Error **errp)
+{
+Object *core = object_property_get_link(qdev_get_machine(), name, NULL);
+
+/*
+ * Allow the link to be unset when the core is unplugged.
+ */
+if (!val) {
+return;
+}
+
+if (core) {
+char *path = object_get_canonical_path(core);
+error_setg(errp, "Slot %s already populated with %s", name, path);
+g_free(path);
+}
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void ppc_spapr_init(MachineState *machine)
 {
@@ -1728,7 +1753,6 @@ static void ppc_spapr_init(MachineState *machine)
 const char *kernel_filename = machine->kernel_filename;
 const char *kernel_cmdline = machine->kernel_cmdline;
 const char *initrd_filename = machine->initrd_filename;
-PowerPCCPU *cpu;
 PCIHostState *phb;
 int i;
 MemoryRegion *sysmem = get_system_memory();
@@ -1742,6 +1766,20 @@ static void ppc_spapr_init(MachineState *machine)
 long load_limit, fw_size;
 bool kernel_le = false;
 char *filename;
+int spapr_cores = smp_cpus / smp_threads;
+int spapr_max_cores = max_cpus / smp_threads;
+
+if (smp_cpus % smp_threads) {
+error_report("smp_cpus (%u) must be multiple of threads (%u)",
+ smp_cpus, smp_threads);
+exit(1);
+}
+
+if (max_cpus % smp_threads) {
+error_report("max_cpus (%u) must be multiple of threads (%u)",
+ max_cpus, smp_threads);
+exit(1);
+}
 
 msi_supported = true;
 
@@ -1800,13 +1838,37 @@ static void ppc_spapr_init(MachineState *machine)
 if (machine->cpu_model == NULL) {
 machine->cpu_model = kvm_enabled() ? "host" : "POWER7";
 }
-for (i = 0; i < smp_cpus; i++) {
-cpu = cpu_ppc_init(machine->cpu_model);
-if (cpu == NULL) {
-error_report("Unable to find PowerPC CPU definition");
-exit(1);
+
+spapr->cores = g_new0(Object *, spapr_max_cores);
+
+for (i = 0; i < spapr_max_cores; i++) {
+char name[32];
+
+/*
+ * Create links from machine objects to all possible cores.
+ */
+snprintf(name, sizeof(name), "%s[%d]", SPAPR_MACHINE_CPU_CORE_PROP, i);
+object_property_add_link(OBJECT(spapr), name, TYPE_SPAPR_CPU_CORE,
+ (Object **)>cores[i],
+ spapr_cpu_core_allow_set_link,
+ OBJ_PROP_LINK_UNREF_ON_RELEASE,
+ _fatal);
+
+/*
+ * Create cores and set link from machine object to core object for
+ * boot time CPUs and realize them.
+ */
+if (i < spapr_cores) {
+Object *core  = object_new(TYPE_SPAPR_CPU_CORE);
+
+object_property_set_str(core, machine->cpu_model, "cpu_model",
+_fatal);
+object_property_set_int(core, smp_threads, "nr_threads",
+_fatal);
+object_property_set_str(core,

[Qemu-devel] [RFC PATCH v1 05/10] cpu: Abstract CPU core type

2016-03-03 Thread Bharata B Rao

Add an abstract CPU core type that could be used by machines that want
to define and hotplug CPUs in core granularity.

Signed-off-by: Bharata B Rao 
---
 hw/cpu/Makefile.objs  |  1 +
 hw/cpu/core.c | 44 
 include/hw/cpu/core.h | 30 ++
 3 files changed, 75 insertions(+)
 create mode 100644 hw/cpu/core.c
 create mode 100644 include/hw/cpu/core.h

diff --git a/hw/cpu/Makefile.objs b/hw/cpu/Makefile.objs
index 0954a18..942a4bb 100644
--- a/hw/cpu/Makefile.objs
+++ b/hw/cpu/Makefile.objs
@@ -2,4 +2,5 @@ obj-$(CONFIG_ARM11MPCORE) += arm11mpcore.o
 obj-$(CONFIG_REALVIEW) += realview_mpcore.o
 obj-$(CONFIG_A9MPCORE) += a9mpcore.o
 obj-$(CONFIG_A15MPCORE) += a15mpcore.o
+obj-y += core.o
 
diff --git a/hw/cpu/core.c b/hw/cpu/core.c
new file mode 100644
index 000..d8caf37
--- /dev/null
+++ b/hw/cpu/core.c
@@ -0,0 +1,44 @@
+/*
+ * CPU core abstract device
+ *
+ * Copyright (C) 2016 Bharata B Rao 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "hw/cpu/core.h"
+
+static char *core_prop_get_slot(Object *obj, Error **errp)
+{
+CPUCore *core = CPU_CORE(obj);
+
+return g_strdup(core->slot);
+}
+
+static void core_prop_set_slot(Object *obj, const char *val, Error **errp)
+{
+CPUCore *core = CPU_CORE(obj);
+
+core->slot = g_strdup(val);
+}
+
+static void cpu_core_instance_init(Object *obj)
+{
+object_property_add_str(obj, "slot", core_prop_get_slot, 
core_prop_set_slot,
+NULL);
+}
+
+static const TypeInfo cpu_core_type_info = {
+.name = TYPE_CPU_CORE,
+.parent = TYPE_DEVICE,
+.abstract = true,
+.instance_size = sizeof(CPUCore),
+.instance_init = cpu_core_instance_init,
+};
+
+static void cpu_core_register_types(void)
+{
+type_register_static(_core_type_info);
+}
+
+type_init(cpu_core_register_types)
diff --git a/include/hw/cpu/core.h b/include/hw/cpu/core.h
new file mode 100644
index 000..2daa724
--- /dev/null
+++ b/include/hw/cpu/core.h
@@ -0,0 +1,30 @@
+/*
+ * CPU core abstract device
+ *
+ * Copyright (C) 2016 Bharata B Rao 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef HW_CPU_CORE_H
+#define HW_CPU_CORE_H
+
+#include "qemu/osdep.h"
+#include "hw/qdev.h"
+
+#define TYPE_CPU_CORE "cpu-core"
+
+#define CPU_CORE(obj) \
+OBJECT_CHECK(CPUCore, (obj), TYPE_CPU_CORE)
+
+typedef struct CPUCore {
+/*< private >*/
+DeviceState parent_obj;
+
+/*< public >*/
+char *slot;
+} CPUCore;
+
+#define CPU_CORE_SLOT_PROP "slot"
+
+#endif
-- 
2.1.0

[Qemu-devel] [RFC PATCH v1 04/10] cpu: Add a sync version of cpu_remove()

2016-03-03 Thread Bharata B Rao

This sync API will be used by the CPU hotplug code to wait for the CPU to
completely get removed before flagging the failure to the device_add
command.

Sync version of this call is needed to correctly recover from CPU
realization failures when ->plug() handler fails.

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 cpus.c| 12 
 include/qom/cpu.h |  8 
 2 files changed, 20 insertions(+)

diff --git a/cpus.c b/cpus.c
index 07cc054..4268334 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1067,6 +1067,8 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
 qemu_kvm_wait_io_event(cpu);
 if (cpu->exit && !cpu_can_run(cpu)) {
 qemu_kvm_destroy_vcpu(cpu);
+cpu->created = false;
+qemu_cond_signal(_cpu_cond);
 qemu_mutex_unlock_iothread();
 return NULL;
 }
@@ -1171,6 +1173,8 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 }
 if (remove_cpu) {
 qemu_tcg_destroy_vcpu(remove_cpu);
+cpu->created = false;
+qemu_cond_signal(_cpu_cond);
 remove_cpu = NULL;
 }
 }
@@ -1336,6 +1340,14 @@ void cpu_remove(CPUState *cpu)
 qemu_cpu_kick(cpu);
 }
 
+void cpu_remove_sync(CPUState *cpu)
+{
+cpu_remove(cpu);
+while (cpu->created) {
+qemu_cond_wait(_cpu_cond, _global_mutex);
+}
+}
+
 /* For temporary buffers for forming a name */
 #define VCPU_THREAD_NAME_SIZE 16
 
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 6e5171b..de8600d 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -765,6 +765,14 @@ void cpu_resume(CPUState *cpu);
  */
 void cpu_remove(CPUState *cpu);
 
+ /**
+ * cpu_remove_sync:
+ * @cpu: The CPU to remove.
+ *
+ * Requests the CPU to be removed and waits till it is removed.
+ */
+void cpu_remove_sync(CPUState *cpu);
+
 /**
  * qemu_init_vcpu:
  * @cpu: The vCPU to initialize.
-- 
2.1.0

[Qemu-devel] [RFC PATCH v1 09/10] xics, xics_kvm: Handle CPU unplug correctly

2016-03-03 Thread Bharata B Rao

XICS is setup for each CPU during initialization. Provide a routine
to undo the same when CPU is unplugged. While here, move ss->cs management
into xics from xics_kvm since there is nothing KVM specific in it.
Also ensure xics reset doesn't set irq for CPUs that are already unplugged.

This allows reboot of a VM that has undergone CPU hotplug and unplug
to work correctly.

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 hw/intc/xics.c| 14 ++
 hw/intc/xics_kvm.c|  8 
 include/hw/ppc/xics.h |  1 +
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 213a370..9fdb551 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -45,6 +45,18 @@ static int get_cpu_index_by_dt_id(int cpu_dt_id)
 return -1;
 }
 
+void xics_cpu_destroy(XICSState *icp, PowerPCCPU *cpu)
+{
+CPUState *cs = CPU(cpu);
+ICPState *ss = >ss[cs->cpu_index];
+
+assert(cs->cpu_index < icp->nr_servers);
+assert(cs == ss->cs);
+
+ss->output = NULL;
+ss->cs = NULL;
+}
+
 void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu)
 {
 CPUState *cs = CPU(cpu);
@@ -54,6 +66,8 @@ void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu)
 
 assert(cs->cpu_index < icp->nr_servers);
 
+ss->cs = cs;
+
 if (info->cpu_setup) {
 info->cpu_setup(icp, cpu);
 }
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 9fe0667..7aab4a1 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -110,8 +110,10 @@ static void icp_kvm_reset(DeviceState *dev)
 icp->pending_priority = 0xff;
 icp->mfrr = 0xff;
 
-/* Make all outputs are deasserted */
-qemu_set_irq(icp->output, 0);
+/* Make all outputs as deasserted only if the CPU thread is in use */
+if (icp->output) {
+qemu_set_irq(icp->output, 0);
+}
 
 icp_set_kvm_state(icp, 1);
 }
@@ -344,8 +346,6 @@ static void xics_kvm_cpu_setup(XICSState *icp, PowerPCCPU 
*cpu)
 if (icpkvm->kernel_xics_fd != -1) {
 int ret;
 
-ss->cs = cs;
-
 ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_XICS, 0,
   icpkvm->kernel_xics_fd, 
kvm_arch_vcpu_id(cs));
 if (ret < 0) {
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index f60b06a..9091054 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -167,5 +167,6 @@ int xics_alloc_block(XICSState *icp, int src, int num, bool 
lsi, bool align,
 void xics_free(XICSState *icp, int irq, int num);
 
 void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu);
+void xics_cpu_destroy(XICSState *icp, PowerPCCPU *cpu);
 
 #endif /* __XICS_H__ */
-- 
2.1.0

[Qemu-devel] [RFC PATCH v1 02/10] exec: Do vmstate unregistration from cpu_exec_exit()

2016-03-03 Thread Bharata B Rao

cpu_exec_init() does vmstate_register and register_savevm for the CPU device.
These need to be undone from cpu_exec_exit(). These changes are needed to
support CPU hot removal.

Signed-off-by: Bharata B Rao 
---
 exec.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/exec.c b/exec.c
index 7c3f747..b8eeb54 100644
--- a/exec.c
+++ b/exec.c
@@ -613,6 +613,8 @@ static void cpu_release_index(CPUState *cpu)
 
 void cpu_exec_exit(CPUState *cpu)
 {
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
 #if defined(CONFIG_USER_ONLY)
 cpu_list_lock();
 #endif
@@ -630,6 +632,16 @@ void cpu_exec_exit(CPUState *cpu)
 #if defined(CONFIG_USER_ONLY)
 cpu_list_unlock();
 #endif
+
+if (cc->vmsd != NULL) {
+vmstate_unregister(NULL, cc->vmsd, cpu);
+}
+#if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY)
+unregister_savevm(NULL, "cpu", cpu->env_ptr);
+#endif
+if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
+vmstate_unregister(NULL, _cpu_common, cpu);
+}
 }
 
 void cpu_exec_init(CPUState *cpu, Error **errp)
-- 
2.1.0

[Qemu-devel] [RFC PATCH v1 08/10] spapr: CPU hotplug support

2016-03-03 Thread Bharata B Rao

Set up device tree entries for the hotplugged CPU core and use the
exising EPOW event infrastructure to send CPU hotplug notification to
the guest.

Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c  | 73 -
 hw/ppc/spapr_cpu_core.c | 60 +
 hw/ppc/spapr_events.c   |  3 ++
 hw/ppc/spapr_rtas.c | 24 ++
 include/hw/ppc/spapr.h  |  4 +++
 include/hw/ppc/spapr_cpu_core.h |  2 ++
 6 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5acb612..6c4ac50 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -603,6 +603,18 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, 
int offset,
 size_t page_sizes_prop_size;
 uint32_t vcpus_per_socket = smp_threads * smp_cores;
 uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
+sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
+sPAPRDRConnector *drc;
+sPAPRDRConnectorClass *drck;
+int drc_index;
+
+if (smc->dr_cpu_enabled) {
+drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, index);
+g_assert(drc);
+drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+drc_index = drck->get_index(drc);
+_FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index)));
+}
 
 /* Note: we keep CI large pages off for now because a 64K capable guest
  * provisioned with large pages might otherwise try to map a qemu
@@ -987,6 +999,16 @@ static void spapr_finalize_fdt(sPAPRMachineState *spapr,
 _FDT(spapr_drc_populate_dt(fdt, 0, NULL, SPAPR_DR_CONNECTOR_TYPE_LMB));
 }
 
+if (smc->dr_cpu_enabled) {
+int offset = fdt_path_offset(fdt, "/cpus");
+ret = spapr_drc_populate_dt(fdt, offset, NULL,
+SPAPR_DR_CONNECTOR_TYPE_CPU);
+if (ret < 0) {
+error_report("Couldn't set up CPU DR device tree properties");
+exit(1);
+}
+}
+
 _FDT((fdt_pack(fdt)));
 
 if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
@@ -1181,7 +1203,7 @@ static void ppc_spapr_reset(void)
 
 }
 
-static void spapr_cpu_reset(void *opaque)
+void spapr_cpu_reset(void *opaque)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 PowerPCCPU *cpu = opaque;
@@ -1622,6 +1644,8 @@ static void spapr_boot_set(void *opaque, const char 
*boot_device,
 void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu, Error **errp)
 {
 CPUPPCState *env = >env;
+CPUState *cs = CPU(cpu);
+int i;
 
 /* Set time-base frequency to 512 MHz */
 cpu_ppc_tb_init(env, TIMEBASE_FREQ);
@@ -1646,6 +1670,14 @@ void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU 
*cpu, Error **errp)
 }
 }
 
+/* Set NUMA node for the added CPUs  */
+for (i = 0; i < nb_numa_nodes; i++) {
+if (test_bit(cs->cpu_index, numa_info[i].node_cpu)) {
+cs->numa_node = i;
+break;
+}
+}
+
 xics_cpu_setup(spapr->icp, cpu);
 qemu_register_reset(spapr_cpu_reset, cpu);
 }
@@ -1768,6 +1800,7 @@ static void ppc_spapr_init(MachineState *machine)
 char *filename;
 int spapr_cores = smp_cpus / smp_threads;
 int spapr_max_cores = max_cpus / smp_threads;
+int smt = kvmppc_smt_threads();
 
 if (smp_cpus % smp_threads) {
 error_report("smp_cpus (%u) must be multiple of threads (%u)",
@@ -1834,6 +1867,15 @@ static void ppc_spapr_init(MachineState *machine)
 spapr_validate_node_memory(machine, _fatal);
 }
 
+if (smc->dr_cpu_enabled) {
+for (i = 0; i < spapr_max_cores; i++) {
+sPAPRDRConnector *drc =
+spapr_dr_connector_new(OBJECT(spapr),
+   SPAPR_DR_CONNECTOR_TYPE_CPU, i * smt);
+qemu_register_reset(spapr_drc_reset, drc);
+}
+}
+
 /* init CPUs */
 if (machine->cpu_model == NULL) {
 machine->cpu_model = kvm_enabled() ? "host" : "POWER7";
@@ -2267,6 +2309,27 @@ out:
 error_propagate(errp, local_err);
 }
 
+void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
+int *fdt_offset, sPAPRMachineState *spapr)
+{
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+DeviceClass *dc = DEVICE_GET_CLASS(cs);
+int id = ppc_get_vcpu_dt_id(cpu);
+void *fdt;
+int offset, fdt_size;
+char *nodename;
+
+fdt = create_device_tree(_size);
+nodename = g_strdup_printf("%s@%x", dc->fw_name, id);
+offset = fdt_add_subnode(fdt, 0, nodename);
+
+spapr_populate_cpu_dt(cs, fdt, offset, spapr);
+g_free(nodename);
+
+*fdt_offset = offset;
+return fdt;
+}
+
 static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp)
 {
@@ -2307,6 +2370,12 @@ static void

[Qemu-devel] [RFC PATCH v1 03/10] cpu: Reclaim vCPU objects

2016-03-03 Thread Bharata B Rao

From: Gu Zheng 

In order to deal well with the kvm vcpus (which can not be removed without any
protection), we do not close KVM vcpu fd, just record and mark it as stopped
into a list, so that we can reuse it for the appending cpu hot-add request if
possible. It is also the approach that kvm guys suggested:
https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html

Signed-off-by: Chen Fan 
Signed-off-by: Gu Zheng 
Signed-off-by: Zhu Guihua 
Signed-off-by: Bharata B Rao 
   [- Explicit CPU_REMOVE() from qemu_kvm/tcg_destroy_vcpu()
  isn't needed as it is done from cpu_exec_exit()
- Use iothread mutex instead of global mutex during
  destroy
- Don't cleanup vCPU object from vCPU thread context
  but leave it to the callers (device_add/device_del)]
Reviewed-by: David Gibson 
---
 cpus.c   | 38 +++
 include/qom/cpu.h| 10 +
 include/sysemu/kvm.h |  1 +
 kvm-all.c| 57 +++-
 kvm-stub.c   |  5 +
 5 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 9592163..07cc054 100644
--- a/cpus.c
+++ b/cpus.c
@@ -953,6 +953,18 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void 
*data), void *data)
 qemu_cpu_kick(cpu);
 }
 
+static void qemu_kvm_destroy_vcpu(CPUState *cpu)
+{
+if (kvm_destroy_vcpu(cpu) < 0) {
+error_report("kvm_destroy_vcpu failed");
+exit(EXIT_FAILURE);
+}
+}
+
+static void qemu_tcg_destroy_vcpu(CPUState *cpu)
+{
+}
+
 static void flush_queued_work(CPUState *cpu)
 {
 struct qemu_work_item *wi;
@@ -1053,6 +1065,11 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
 }
 }
 qemu_kvm_wait_io_event(cpu);
+if (cpu->exit && !cpu_can_run(cpu)) {
+qemu_kvm_destroy_vcpu(cpu);
+qemu_mutex_unlock_iothread();
+return NULL;
+}
 }
 
 return NULL;
@@ -1108,6 +1125,7 @@ static void tcg_exec_all(void);
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
 CPUState *cpu = arg;
+CPUState *remove_cpu = NULL;
 
 rcu_register_thread();
 
@@ -1145,6 +1163,16 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 }
 }
 qemu_tcg_wait_io_event(QTAILQ_FIRST());
+CPU_FOREACH(cpu) {
+if (cpu->exit && !cpu_can_run(cpu)) {
+remove_cpu = cpu;
+break;
+}
+}
+if (remove_cpu) {
+qemu_tcg_destroy_vcpu(remove_cpu);
+remove_cpu = NULL;
+}
 }
 
 return NULL;
@@ -1301,6 +1329,13 @@ void resume_all_vcpus(void)
 }
 }
 
+void cpu_remove(CPUState *cpu)
+{
+cpu->stop = true;
+cpu->exit = true;
+qemu_cpu_kick(cpu);
+}
+
 /* For temporary buffers for forming a name */
 #define VCPU_THREAD_NAME_SIZE 16
 
@@ -1517,6 +1552,9 @@ static void tcg_exec_all(void)
 break;
 }
 } else if (cpu->stop || cpu->stopped) {
+if (cpu->exit) {
+next_cpu = CPU_NEXT(cpu);
+}
 break;
 }
 }
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 7052eee..6e5171b 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -237,6 +237,7 @@ struct kvm_run;
  * @halted: Nonzero if the CPU is in suspended state.
  * @stop: Indicates a pending stop request.
  * @stopped: Indicates the CPU has been artificially stopped.
+ * @exit: Indicates the CPU has exited due to an unplug operation.
  * @crash_occurred: Indicates the OS reported a crash (panic) for this CPU
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *   CPU and return to its top level loop.
@@ -289,6 +290,7 @@ struct CPUState {
 bool created;
 bool stop;
 bool stopped;
+bool exit;
 bool crash_occurred;
 bool exit_request;
 uint32_t interrupt_request;
@@ -756,6 +758,14 @@ void cpu_exit(CPUState *cpu);
 void cpu_resume(CPUState *cpu);
 
 /**
+ * cpu_remove:
+ * @cpu: The CPU to remove.
+ *
+ * Requests the CPU to be removed.
+ */
+void cpu_remove(CPUState *cpu);
+
+/**
  * qemu_init_vcpu:
  * @cpu: The vCPU to initialize.
  *
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 6695fa7..5d5b602 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -216,6 +216,7 @@ int kvm_has_intx_set_mask(void);
 
 int kvm_init_vcpu(CPUState *cpu);
 int kvm_cpu_exec(CPUState *cpu);
+int kvm_destroy_vcpu(CPUState *cpu);
 
 #ifdef NEED_CPU_H
 
diff --git a/kvm-all.c b/kvm-all.c
index a65e73f..b41ff46 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -61,6 +61,12 @@
 
 #define KVM_MSI_HASHTAB_SIZE256
 
+struct KVMParkedVcpu {
+unsigned long vcpu_id;
+int

[Qemu-devel] [RFC PATCH v1 00/10] Core based CPU hotplug for PowerPC sPAPR

2016-03-03 Thread Bharata B Rao

Hi,

This is the next version of "Core based CPU hotplug for PowerPC sPAPR" that
was posted at
https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00286.html

Here is a quick summary on how this approach is different from the
previous approaches that I have been pursuing with the last one being
v7 that posted here:
https://lists.gnu.org/archive/html/qemu-ppc/2016-01/msg00477.html

- Earlier approaches used an independent PowerPC specific core device while
  this approach uses an sPAPR specific core device that is derived from
  generic core device.
- The earlier approach didn't have the notion of where the boot time as
  well as hot-plugged cores would sit. In this approach QOM links are
  created from machine object to all possible core objects. While the
  links are set for boot time cores during machine init, the same is done
  for hotplugged cores at hotplug time. The name of this link property
  is standardized as "core[N]" since these links link the machine object
  with core devices. The link name ("core[N]") is used with core device's
  "slot" property to identify the QOM link to set for this core.
  (qemu) device_add 
spapr-cpu-core,id=core2,nr_threads=8,cpu_model=host,slot=core[2]
- The ealier approach created threads from instance_init of the core object
  using a modified cpu_generic_init() routine. It used smp_threads and
  MachineState->cpu_model globals. In the current approach, nr_threads and
  cpu_model are obtained as properties and threads are created from
  property setters. The thread objects are allocated as part of core
  device and object_initialize() is used to initialize them.

Some open questions remain:

- Does this device_add semantics look ok ?
  (qemu) device_add 
spapr-cpu-core,id=core2,nr_threads=8,cpu_model=host,slot=core[2]
- Is there need for machine to core object links ? If not, what would
  determine the slot/location of the core device ?
- How to fit this with CPUSlotProperties HMP interface that Igor is
  working on ?

This version has the following changes:

- Dropped QMP and HMP enumeration patches for now since it isn't clear
  if the approach I took would be preferrable by all archs. Will wait
  and see how Igor's patches evolve here.
- Added the following pre-req patches to support CPU removal:
  exec: Remove cpu from cpus list during cpu_exec_exit()
  exec: Do vmstate unregistration from cpu_exec_exit()
  cpu: Reclaim vCPU objects
  cpu: Add a sync version of cpu_remove()
- Added sPAPR CPU hot removal support
  xics,xics_kvm: Handle CPU unplug correctly
  spapr: CPU hot unplug support
- Moved up the "slot" property into abstract cpu-core device.
- Recover when thread creation fails (spapr_cpu_core_create_threads())
- cpu_model property in spapr-cpu-core deivce is now tracked using
  ObjectClass pointer instead of string.
- Fail topologies with incomplete cores from within sPAPR's machine init.
- Fixes in spapr-cpu-core object creation from machine init to create
  only boottime CPUs.
- Moved board specific wiring code for CPU threads (spapr_cpu_init)
  into core's realize method to be called after each thread's realization.
- No specific action under TYPE_CPU in ->plug() handler either for hotplug
  or hot removal.
- Moved all core related CPU hotplug routines into spapr_cpu_core.c where
  it truly belongs.
- Setting of NUMA node for hotplugged CPUs moved to spapr_cpu_init()).
- Compared to previous implementation of hot removal that was part of
  different series earlier, this implementation moves all core removal
  logic into spapr_cpu_core.c.
- Some minor cleanups like use of g_new0 instead of g_malloc0 etc.

This patchset is present at:
https://github.com/bharata/qemu/commits/spapr-cpu-core

Bharata B Rao (9):
  exec: Remove cpu from cpus list during cpu_exec_exit()
  exec: Do vmstate unregistration from cpu_exec_exit()
  cpu: Add a sync version of cpu_remove()
  cpu: Abstract CPU core type
  spapr: CPU core device
  spapr: Represent boot CPUs as spapr-cpu-core devices
  spapr: CPU hotplug support
  xics,xics_kvm: Handle CPU unplug correctly
  spapr: CPU hot unplug support

Gu Zheng (1):
  cpu: Reclaim vCPU objects

 cpus.c  |  50 ++
 exec.c  |  44 -
 hw/cpu/Makefile.objs|   1 +
 hw/cpu/core.c   |  44 +
 hw/intc/xics.c  |  14 ++
 hw/intc/xics_kvm.c  |   8 +-
 hw/ppc/Makefile.objs|   1 +
 hw/ppc/spapr.c  | 174 +--
 hw/ppc/spapr_cpu_core.c | 361 
 hw/ppc/spapr_events.c   |   3 +
 hw/ppc/spapr_rtas.c |  24 +++
 include/hw/cpu/core.h   |  30 
 include/hw/ppc/spapr.h  |   9 +
 include/hw/ppc/spapr_cpu_core.h |  42 +
 include/hw/ppc/xics.h   |   1 +
 include/qom/cpu.h   |  18 ++
 include/sysemu/kvm.h|   1 +
 kvm-all.c   |  57 ++-
 kvm-stub.c

[Qemu-devel] [RFC PATCH v1 01/10] exec: Remove cpu from cpus list during cpu_exec_exit()

2016-03-03 Thread Bharata B Rao

CPUState *cpu gets added to the cpus list during cpu_exec_init(). It
should be removed from cpu_exec_exit().

cpu_exec_init() is called from generic CPU::instance_finalize and some
archs like PowerPC call it from CPU unrealizefn. So ensure that we
dequeue the cpu only once.

Now -1 value for cpu->cpu_index indicates that we have already dequeued
the cpu for CONFIG_USER_ONLY case also.

Signed-off-by: Bharata B Rao 
---
 exec.c | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index c62c439..7c3f747 100644
--- a/exec.c
+++ b/exec.c
@@ -588,15 +588,9 @@ static int cpu_get_free_index(Error **errp)
 return cpu;
 }
 
-void cpu_exec_exit(CPUState *cpu)
+static void cpu_release_index(CPUState *cpu)
 {
-if (cpu->cpu_index == -1) {
-/* cpu_index was never allocated by this @cpu or was already freed. */
-return;
-}
-
 bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
-cpu->cpu_index = -1;
 }
 #else
 
@@ -611,11 +605,33 @@ static int cpu_get_free_index(Error **errp)
 return cpu_index;
 }
 
-void cpu_exec_exit(CPUState *cpu)
+static void cpu_release_index(CPUState *cpu)
 {
+return;
 }
 #endif
 
+void cpu_exec_exit(CPUState *cpu)
+{
+#if defined(CONFIG_USER_ONLY)
+cpu_list_lock();
+#endif
+if (cpu->cpu_index == -1) {
+/* cpu_index was never allocated by this @cpu or was already freed. */
+#if defined(CONFIG_USER_ONLY)
+cpu_list_unlock();
+#endif
+return;
+}
+
+QTAILQ_REMOVE(, cpu, node);
+cpu_release_index(cpu);
+cpu->cpu_index = -1;
+#if defined(CONFIG_USER_ONLY)
+cpu_list_unlock();
+#endif
+}
+
 void cpu_exec_init(CPUState *cpu, Error **errp)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
-- 
2.1.0

Re: [Qemu-devel] [PATCH] rng: switch request queue to QSIMPLEQ

2016-03-03 Thread Amit Shah

On (Thu) 03 Mar 2016 [14:16:11], Ladi Prosek wrote:
> QSIMPLEQ supports appending to tail in O(1) and is intrusive so
> it doesn't require extra memory allocations for the bookkeeping
> data.
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Ladi Prosek 

> @@ -83,24 +83,27 @@ static void rng_backend_free_request(RngRequest *req)
>  
>  static void rng_backend_free_requests(RngBackend *s)
>  {
> -GSList *i;
> +RngRequest *req, *next;
>  
> -for (i = s->requests; i; i = i->next) {
> -rng_backend_free_request(i->data);
> +QSIMPLEQ_FOREACH_SAFE(req, >requests, next, next) {
> +rng_backend_free_request(req);
>  }
>  
> -g_slist_free(s->requests);
> -s->requests = NULL;
> +QSIMPLEQ_INIT(>requests);
>  }

This init here isn't necessary, the accessors for the queue will take
care of this.

Amit

Re: [Qemu-devel] [PATCH V2 3/3] tests/test-filter-redirector: Add unit test for filter-redirector

2016-03-03 Thread Li Zhijian



On 03/02/2016 02:47 PM, Jason Wang wrote:



On 02/29/2016 08:23 PM, Zhang Chen wrote:

In this unit test,we will test the filter redirector function.

Start qemu with:

 "-netdev tap,id=qtest-bn0 "


Please don't use tap since it needs

- CAP_NET_ADMIN
- if-up script

Neither of above could not be true for a qtest environment.


 "-device rtl8139,netdev=qtest-bn0,id=qtest-e0 "
 "-chardev socket,id=redirector0,path=%s,server,nowait "
 "-chardev socket,id=redirector1,path=%s,server,nowait "
 "-object filter-redirector,id=qtest-f1,netdev=qtest-bn0,"
 "queue=tx,indev=redirector1 "
 "-object filter-redirector,id=qtest-f0,netdev=qtest-bn0,"
 "queue=tx,outdev=redirector0 "

We inject packet to -chardev redirector1,then filter-redirector
will pass it to filter, another filter-redirector get it and
redirect it to redirector0,we read packet from redirector0
and compare to what we inject.


Looks correct but I think queue='rx' should also be tested here. How about:

- using backend
- redirect tx traffic to a chardev, then inject packet from socket and
read it from chardev
- redirect from another chardev to rx traffic, then inject packet from
chardev and read if from socket?



hi, Jason

IIUC, a full UT for redirector should include following cases.
How about they:

rd[n]: redirctor n

Case 1, tx traffic flow:

qemu side  | test side
   |
+-+|  +---+
| backend <---+ sock0 |
+++|  +---+
 | |
+v+  +---+ |
|  rd0+->+chardev| |
+-+  +---+---+ |
 | |
+-+  | |
|  rd1<--+ |
+++|
 | |
+v+|  +---+
|  rd2+--->sock1  |
+-+|  +---+
   +

--
Case 2, rx traffic flow
qemu side  | test side
   |
+-+|  +---+
| backend +---> sock1 |
+^+|  +---+
 | |
+++  +---+ |
|  rd0+->+chardev| |
+-+  +---+---+ |
 ^ |
+-+  | |
|  rd1+--+ |
+^+|
 | |
+++|  +---+
|  rd2<---+sock0  |
+-+|  +---+
   +

Thanks
Li Zhijian

[Qemu-devel] [PATCH 0/2] target-ppc: Clean up handling of SDR1 and external HPTs

2016-03-03 Thread David Gibson

This pair of patches cleans up handling of SDR1 (master page table
pointer register for Power) and related cases with an external
(i.e. managed by qemu or KVM, rather than the guest) hash page table
(HPT).

I wouldn't push 1/2 on its own after the soft freeze, except that it
simplifies 2/2 which fixes a real regression.

David Gibson (2):
  target-ppc: Add helpers for updating a CPU's SDR1 and external HPT
  target-ppc: Eliminate kvmppc_kern_htab global

 hw/ppc/spapr.c  | 16 ++
 hw/ppc/spapr_hcall.c| 10 +++
 target-ppc/kvm.c| 15 ++
 target-ppc/kvm_ppc.h|  6 
 target-ppc/mmu-hash64.c | 80 -
 target-ppc/mmu-hash64.h | 11 ---
 target-ppc/mmu_helper.c | 13 
 7 files changed, 101 insertions(+), 50 deletions(-)

-- 
2.5.0

[Qemu-devel] [PATCH 2/2] target-ppc: Eliminate kvmppc_kern_htab global

2016-03-03 Thread David Gibson

fa48b43 "target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM"
purports to remove a hack in the handling of hash page tables (HPTs)
managed by KVM instead of qemu.  However, it actually went in the wrong
direction.

That patch requires anything looking for an external HPT (that is one not
managed by the guest itself) to check both env->external_htab (for a qemu
managed HPT) and kvmppc_kern_htab (for a KVM managed HPT).  That's a
problem because kvmppc_kern_htab is local to mmu-hash64.c, but some places
which need to check for an external HPT are outside that, such as
kvm_arch_get_registers().  The latter was subtly broken by the earlier
patch such that gdbstub can no longer access memory.

Basically a KVM managed HPT is much more like a qemu managed HPT than it is
like a guest managed HPT, so the original "hack" was actually on the right
track.

This partially reverts fa48b43, so we again mark a KVM managed external HPT
by putting a special but non-NULL value in env->external_htab.  It then
goes further, using that marker to eliminate the kvmppc_kern_htab global
entirely.  The ppc_hash64_set_external_hpt() helper function is extended
to set that marker if passed a NULL value (if you're setting an external
HPT, but don't have an actual HPT to set, the assumption is that it must
be a KVM managed HPT).

This also has some flow-on changes to the HPT access helpers, required by
the above changes.

Reported-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c  |  3 +--
 hw/ppc/spapr_hcall.c| 10 +-
 target-ppc/mmu-hash64.c | 40 ++--
 target-ppc/mmu-hash64.h |  9 +++--
 4 files changed, 27 insertions(+), 35 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a88e3af..dc1f889 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1091,7 +1091,7 @@ static void spapr_reallocate_hpt(sPAPRMachineState 
*spapr, int shift,
 }
 
 spapr->htab_shift = shift;
-kvmppc_kern_htab = true;
+spapr->htab = NULL;
 } else {
 /* kernel-side HPT not needed, allocate in userspace instead */
 size_t size = 1ULL << shift;
@@ -1106,7 +1106,6 @@ static void spapr_reallocate_hpt(sPAPRMachineState 
*spapr, int shift,
 
 memset(spapr->htab, 0, size);
 spapr->htab_shift = shift;
-kvmppc_kern_htab = false;
 
 for (i = 0; i < size / HASH_PTE_SIZE_64; i++) {
 DIRTY_HPTE(HPTE(spapr->htab, i));
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 1733482..b2b1b93 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -122,17 +122,17 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 break;
 }
 }
-ppc_hash64_stop_access(token);
+ppc_hash64_stop_access(cpu, token);
 if (index == 8) {
 return H_PTEG_FULL;
 }
 } else {
 token = ppc_hash64_start_access(cpu, pte_index);
 if (ppc_hash64_load_hpte0(cpu, token, 0) & HPTE64_V_VALID) {
-ppc_hash64_stop_access(token);
+ppc_hash64_stop_access(cpu, token);
 return H_PTEG_FULL;
 }
-ppc_hash64_stop_access(token);
+ppc_hash64_stop_access(cpu, token);
 }
 
 ppc_hash64_store_hpte(cpu, pte_index + index,
@@ -165,7 +165,7 @@ static RemoveResult remove_hpte(PowerPCCPU *cpu, 
target_ulong ptex,
 token = ppc_hash64_start_access(cpu, ptex);
 v = ppc_hash64_load_hpte0(cpu, token, 0);
 r = ppc_hash64_load_hpte1(cpu, token, 0);
-ppc_hash64_stop_access(token);
+ppc_hash64_stop_access(cpu, token);
 
 if ((v & HPTE64_V_VALID) == 0 ||
 ((flags & H_AVPN) && (v & ~0x7fULL) != avpn) ||
@@ -288,7 +288,7 @@ static target_ulong h_protect(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 token = ppc_hash64_start_access(cpu, pte_index);
 v = ppc_hash64_load_hpte0(cpu, token, 0);
 r = ppc_hash64_load_hpte1(cpu, token, 0);
-ppc_hash64_stop_access(token);
+ppc_hash64_stop_access(cpu, token);
 
 if ((v & HPTE64_V_VALID) == 0 ||
 ((flags & H_AVPN) && (v & ~0x7fULL) != avpn)) {
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 5507781..76510c3 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -36,10 +36,11 @@
 #endif
 
 /*
- * Used to indicate whether we have allocated htab in the
- * host kernel
+ * Used to indicate that a CPU has it's hash page table (HPT) managed
+ * within the host kernel
  */
-bool kvmppc_kern_htab;
+#define MMU_HASH64_KVM_MANAGED_HPT  ((void *)-1)
+
 /*
  * SLB handling
  */
@@ -283,7 +284,11 @@ void ppc_hash64_set_external_hpt(PowerPCCPU *cpu, void 
*hpt, int shift,
 
 cpu_synchronize_state(CPU(cpu));
 
-env->external_htab = hpt;
+if (hpt) {
+env->external_htab = hpt;
+} else {
+env->external_htab = MMU_HASH64_KVM_MANAGED_HPT;
+}

[Qemu-devel] [PATCH 1/2] target-ppc: Add helpers for updating a CPU's SDR1 and external HPT

2016-03-03 Thread David Gibson

When a Power cpu with 64-bit hash MMU has it's hash page table (HPT)
pointer updated by a write to the SDR1 register we need to update some
derived variables.  Likewise, when the cpu is configured for an external
HPT (one not in the guest memory space) some derived variables need to be
updated.

Currently the logic for this is (partially) duplicated in ppc_store_sdr1()
and in spapr_cpu_reset().  In future we're going to need it in some other
places, so make some common helpers for this update.

In addition the new ppc_hash64_set_external_hpt() helper also updates
SDR1 in KVM - it's not updated by the normal runtime KVM <-> qemu CPU
synchronization.  In a sense this belongs logically in the
ppc_hash64_set_sdr1() helper, but that is called from
kvm_arch_get_registers() so can't itself call cpu_synchronize_state()
without infinite recursion.  In practice this doesn't matter because
the only other caller is TCG specific.

Currently there aren't situations where updating SDR1 at runtime in KVM
matters, but there are going to be in future.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c  | 13 ++---
 target-ppc/kvm.c| 15 +++
 target-ppc/kvm_ppc.h|  6 ++
 target-ppc/mmu-hash64.c | 42 ++
 target-ppc/mmu-hash64.h |  6 ++
 target-ppc/mmu_helper.c | 13 ++---
 6 files changed, 77 insertions(+), 18 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e9d4abf..a88e3af 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1196,17 +1196,8 @@ static void spapr_cpu_reset(void *opaque)
 
 env->spr[SPR_HIOR] = 0;
 
-env->external_htab = (uint8_t *)spapr->htab;
-env->htab_base = -1;
-/*
- * htab_mask is the mask used to normalize hash value to PTEG index.
- * htab_shift is log2 of hash table size.
- * We have 8 hpte per group, and each hpte is 16 bytes.
- * ie have 128 bytes per hpte entry.
- */
-env->htab_mask = (1ULL << (spapr->htab_shift - 7)) - 1;
-env->spr[SPR_SDR1] = (target_ulong)(uintptr_t)spapr->htab |
-(spapr->htab_shift - 18);
+ppc_hash64_set_external_hpt(cpu, spapr->htab, spapr->htab_shift,
+_fatal);
 }
 
 static void spapr_create_nvram(sPAPRMachineState *spapr)
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index d67c169..99b3231 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2537,3 +2537,18 @@ int kvmppc_enable_hwrng(void)
 
 return kvmppc_enable_hcall(kvm_state, H_RANDOM);
 }
+
+int kvmppc_update_sdr1(PowerPCCPU *cpu)
+{
+CPUState *cs = CPU(cpu);
+
+if (!kvm_enabled()) {
+return 0; /* nothing to do */
+}
+
+/* The normal KVM_PUT_RUNTIME_STATE doesn't include SDR1, which is
+ * why we need an explicit update for it.  KVM_PUT_RESET_STATE is
+ * overkill, but this is a pretty rare operation, so it's simpler
+ * than writing a special purpose updater */
+return kvm_arch_put_registers(cs, KVM_PUT_RESET_STATE);
+}
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index fd64c44..5564988 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -55,6 +55,7 @@ void kvmppc_hash64_write_pte(CPUPPCState *env, target_ulong 
pte_index,
  target_ulong pte0, target_ulong pte1);
 bool kvmppc_has_cap_fixup_hcalls(void);
 int kvmppc_enable_hwrng(void);
+int kvmppc_update_sdr1(PowerPCCPU *cpu);
 
 #else
 
@@ -246,6 +247,11 @@ static inline int kvmppc_enable_hwrng(void)
 {
 return -1;
 }
+
+static inline int kvmppc_update_sdr1(PowerPCCPU *cpu)
+{
+return 0; /* nothing to do */
+}
 #endif
 
 #ifndef CONFIG_KVM
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 9c58fbf..5507781 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -258,6 +258,48 @@ target_ulong helper_load_slb_vsid(CPUPPCState *env, 
target_ulong rb)
 /*
  * 64-bit hash table MMU handling
  */
+void ppc_hash64_set_sdr1(PowerPCCPU *cpu, target_ulong value,
+ Error **errp)
+{
+CPUPPCState *env = >env;
+target_ulong htabsize = value & SDR_64_HTABSIZE;
+
+env->spr[SPR_SDR1] = value;
+if (htabsize > 28) {
+error_setg(errp,
+   "Invalid HTABSIZE 0x" TARGET_FMT_lx" stored in SDR1",
+   htabsize);
+htabsize = 28;
+}
+env->htab_mask = (1ULL << (htabsize + 18 - 7)) - 1;
+env->htab_base = value & SDR_64_HTABORG;
+}
+
+void ppc_hash64_set_external_hpt(PowerPCCPU *cpu, void *hpt, int shift,
+ Error **errp)
+{
+CPUPPCState *env = >env;
+Error *local_err = NULL;
+
+cpu_synchronize_state(CPU(cpu));
+
+env->external_htab = hpt;
+ppc_hash64_set_sdr1(cpu, (target_ulong)(uintptr_t)hpt | (shift - 18),
+_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+/* Not strictly necessary, but makes it clearer that an

Re: [Qemu-devel] [Qemu-ppc] [PATCH] target-ppc/pseries: Clean up handling of KVM managed external HPTs

2016-03-03 Thread David Gibson

On Fri, Mar 04, 2016 at 01:40:42PM +1100, David Gibson wrote:
> fa48b43 "target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM"
> purports to remove a hack in the handling of hash page tables (HPTs)
> managed by KVM instead of qemu.  However, it makes the wrong call.
> 
> That patch requires anything looking for an external HPT (that is one not
> managed by the guest itself) to check both env->external_htab (for a qemu
> managed HPT) and kvmppc_kern_htab (for a KVM managed HPT).  That's a
> problem because kvmppc_kern_htab is local to mmu-hash64.c, but some places
> which need to check for an external HPT are outside that, such as
> kvm_arch_get_registers().  The latter was subtly broken by the earlier
> patch such that gdbstub can no longer access memory.
> 
> Basically a KVM managed HPT is much more like a qemu managed HPT than it is
> like a guest managed HPT, so the original "hack" was actually on the right
> track.
> 
> This partially reverts fa48b43, marking a KVM managed external HPT by
> putting a special but non-NULL value in env->external_htab.  It then goes
> further, using that marker to eliminate the kvmppc_kern_htab global
> entirely, and adding a ppc_hash64_set_external_hpt() helper function to
> reduce the amount of intimate knowledge of the cpu code that the pseries
> machine type needs to set this up correctly.
> 
> This also has some flow-on changes to the HPT access helpers, required by
> the above changes.
> 
> Reported-by: Greg Kurz 
> Signed-off-by: David Gibson 

Self NACK, sorry.  Realised this has a stupid omission (and also
partially overlaps with another patch I was working on).

> ---
>  hw/ppc/spapr.c  |  6 ++
>  hw/ppc/spapr_hcall.c| 10 +-
>  target-ppc/mmu-hash64.c | 46 +-
>  target-ppc/mmu-hash64.h |  9 -
>  4 files changed, 36 insertions(+), 35 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e9d4abf..d8b749c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1091,7 +1091,7 @@ static void spapr_reallocate_hpt(sPAPRMachineState 
> *spapr, int shift,
>  }
>  
>  spapr->htab_shift = shift;
> -kvmppc_kern_htab = true;
> +spapr->htab = NULL;
>  } else {
>  /* kernel-side HPT not needed, allocate in userspace instead */
>  size_t size = 1ULL << shift;
> @@ -1106,7 +1106,6 @@ static void spapr_reallocate_hpt(sPAPRMachineState 
> *spapr, int shift,
>  
>  memset(spapr->htab, 0, size);
>  spapr->htab_shift = shift;
> -kvmppc_kern_htab = false;
>  
>  for (i = 0; i < size / HASH_PTE_SIZE_64; i++) {
>  DIRTY_HPTE(HPTE(spapr->htab, i));
> @@ -1196,8 +1195,7 @@ static void spapr_cpu_reset(void *opaque)
>  
>  env->spr[SPR_HIOR] = 0;
>  
> -env->external_htab = (uint8_t *)spapr->htab;
> -env->htab_base = -1;
> +ppc_hash64_set_external_hpt(cpu, spapr->htab);
>  /*
>   * htab_mask is the mask used to normalize hash value to PTEG index.
>   * htab_shift is log2 of hash table size.
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 1733482..b2b1b93 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -122,17 +122,17 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  break;
>  }
>  }
> -ppc_hash64_stop_access(token);
> +ppc_hash64_stop_access(cpu, token);
>  if (index == 8) {
>  return H_PTEG_FULL;
>  }
>  } else {
>  token = ppc_hash64_start_access(cpu, pte_index);
>  if (ppc_hash64_load_hpte0(cpu, token, 0) & HPTE64_V_VALID) {
> -ppc_hash64_stop_access(token);
> +ppc_hash64_stop_access(cpu, token);
>  return H_PTEG_FULL;
>  }
> -ppc_hash64_stop_access(token);
> +ppc_hash64_stop_access(cpu, token);
>  }
>  
>  ppc_hash64_store_hpte(cpu, pte_index + index,
> @@ -165,7 +165,7 @@ static RemoveResult remove_hpte(PowerPCCPU *cpu, 
> target_ulong ptex,
>  token = ppc_hash64_start_access(cpu, ptex);
>  v = ppc_hash64_load_hpte0(cpu, token, 0);
>  r = ppc_hash64_load_hpte1(cpu, token, 0);
> -ppc_hash64_stop_access(token);
> +ppc_hash64_stop_access(cpu, token);
>  
>  if ((v & HPTE64_V_VALID) == 0 ||
>  ((flags & H_AVPN) && (v & ~0x7fULL) != avpn) ||
> @@ -288,7 +288,7 @@ static target_ulong h_protect(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  token = ppc_hash64_start_access(cpu, pte_index);
>  v = ppc_hash64_load_hpte0(cpu, token, 0);
>  r = ppc_hash64_load_hpte1(cpu, token, 0);
> -ppc_hash64_stop_access(token);
> +ppc_hash64_stop_access(cpu, token);
>  
>  if ((v & HPTE64_V_VALID) == 0 ||
>  ((flags & H_AVPN) && (v & ~0x7fULL) != avpn)) {
> diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
> index

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 16/16] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2016-03-03 Thread David Gibson

On Tue, Mar 01, 2016 at 08:10:41PM +1100, Alexey Kardashevskiy wrote:
> This adds support for Dynamic DMA Windows (DDW) option defined by
> the SPAPR specification which allows to have additional DMA window(s)
> 
> This implements DDW for emulated and VFIO devices. As all TCE root regions
> are mapped at 0 and 64bit long (and actual tables are child regions),
> this replaces memory_region_add_subregion() with _overlap() to make
> QEMU memory API happy.
> 
> This reserves RTAS token numbers for DDW calls.
> 
> This changes the TCE table migration descriptor to support dynamic
> tables as from now on, PHB will create as many stub TCE table objects
> as PHB can possibly support but not all of them might be initialized at
> the time of migration because DDW might or might not be requested by
> the guest.
> 
> The "ddw" property is enabled by default on a PHB but for compatibility
> the pseries-2.5 machine and older disable it.
> 
> This implements DDW for VFIO. The host kernel support is required.
> This adds a "levels" property to PHB to control the number of levels
> in the actual TCE table allocated by the host kernel, 0 is the default
> value to tell QEMU to calculate the correct value. Current hardware
> supports up to 5 levels.
> 
> The existing linux guests try creating one additional huge DMA window
> with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
> the guest switches to dma_direct_ops and never calls TCE hypercalls
> (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
> and not waste time on map/unmap later. This adds a "dma64_win_addr"
> property which is a bus address for the 64bit window and by default
> set to 0x800... as this is what the modern POWER8 hardware
> uses and this allows having emulated and VFIO devices on the same bus.
> 
> This adds 4 RTAS handlers:
> * ibm,query-pe-dma-window
> * ibm,create-pe-dma-window
> * ibm,remove-pe-dma-window
> * ibm,reset-pe-dma-window
> These are registered from type_init() callback.
> 
> These RTAS handlers are implemented in a separate file to avoid polluting
> spapr_iommu.c with PCI.
> 
> TODO (which I have no idea how to implement properly):
> 1. check the host kernel actually supports SPAPR_PCI_DMA_MAX_WINDOWS
> windows and 12/16/24 page shift;

As noted in a different subthread, this information is there in the
container.

> 2. fix container::min_iova, max_iova - as for now, they are useless,
> and I'd expect IOMMU MR boundaries to serve this purpose really;

This seems to show a similar confusion of concepts to #1.
container::min_iova, container::max_iova advertise limitations of the
host IOMMU, the IOMMU MR boundaries show constraints of the guest
IOMMU.  You need to verify the guest constraints against the host
constraints.

A more flexible method than min/max iova will be necessary though, now
that the host IOMMU allows more flexible configurations than a single
window.

> 3. vfio_listener_region_add/vfio_listener_region_del do explicitely
> create/remove huge DMA window as we do not have vfio_container_ioctl()
> anymore, do we want to move these to some sort of callbacks? How, where?
> 
> Signed-off-by: Alexey Kardashevskiy 
> 
> # Conflicts:
> # include/hw/pci-host/spapr.h
> 
> # Conflicts:
> # hw/vfio/common.c
> ---
>  hw/ppc/Makefile.objs|   1 +
>  hw/ppc/spapr.c  |   7 +-
>  hw/ppc/spapr_iommu.c|  32 -
>  hw/ppc/spapr_pci.c  |  61 +++--
>  hw/ppc/spapr_rtas_ddw.c | 306 
> 
>  hw/vfio/common.c|  70 +-
>  include/hw/pci-host/spapr.h |  13 ++
>  include/hw/ppc/spapr.h  |  17 ++-
>  trace-events|   6 +
>  9 files changed, 489 insertions(+), 24 deletions(-)
>  create mode 100644 hw/ppc/spapr_rtas_ddw.c
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index c1ffc77..986b36f 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -7,6 +7,7 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o 
> spapr_rng.o
>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>  obj-y += spapr_pci_vfio.o
>  endif
> +obj-$(CONFIG_PSERIES) += spapr_rtas_ddw.o
>  # PowerPC 4xx boards
>  obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
>  obj-y += ppc4xx_pci.o
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e9d4abf..2473217 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2370,7 +2370,12 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", true);
>   * pseries-2.5
>   */
>  #define SPAPR_COMPAT_2_5 \
> -HW_COMPAT_2_5
> +HW_COMPAT_2_5 \
> +{\
> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
> +.property = "ddw",\
> +.value= stringify(off),\
> +},
>  
>  static void spapr_machine_2_5_instance_options(MachineState *machine)
>  {
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 8aa2238..e32f71b 100644
> ---

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 07/16] vfio, memory: Notify IOMMU about starting/stopping being used by VFIO

2016-03-03 Thread David Gibson

On Thu, Mar 03, 2016 at 05:01:31PM +1100, Alexey Kardashevskiy wrote:
> On 03/03/2016 04:28 PM, David Gibson wrote:
> >On Tue, Mar 01, 2016 at 08:10:32PM +1100, Alexey Kardashevskiy wrote:
> >>This adds a vfio_votify() callback to inform an IOMMU (and then its owner)
> >>that VFIO started using the IOMMU. This is used by the pseries machine to
> >>enable/disable in-kernel acceleration of TCE hypercalls.
> >>
> >>Signed-off-by: Alexey Kardashevskiy 
> >
> >Hmm.. the current approach of having a hook when vfio-pci devices are
> >attached is pretty ugly, but what exactly the case that it doesn't
> >handle and this approach does?
> 
> Sorry, I am not following you here. What hook do you mean here?
> 
> My hook fixes the case when I want to enable/disable KVM acceleration,
> without these patches, I need to re-count how many vfio-pci devices are
> there and it is more ugly with PCI hotplug/unplug...
> 
> 
> >This two tiered notify system for a single bit is also kinda ugly.
> >
> >>---
> >>  hw/ppc/spapr_iommu.c   |  9 +
> >>  hw/ppc/spapr_pci.c | 14 --
> >>  hw/vfio/common.c   |  7 +++
> >>  include/exec/memory.h  |  2 ++
> >>  include/hw/ppc/spapr.h |  4 
> >>  5 files changed, 30 insertions(+), 6 deletions(-)
> >>
> >>diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> >>index 8a88a74..67a8356 100644
> >>--- a/hw/ppc/spapr_iommu.c
> >>+++ b/hw/ppc/spapr_iommu.c
> >>@@ -136,6 +136,13 @@ static IOMMUTLBEntry 
> >>spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
> >>  return ret;
> >>  }
> >>
> >>+static int spapr_tce_vfio_notify(MemoryRegion *iommu, bool attached)
> >>+{
> >>+sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> >>+
> >>+return spapr_tce_vfio_notify_owner(tcet->owner, tcet, attached);
> >
> >I'm guessing the "owner" is the PHB, but I'm not entirely clear.
> >
> >Could you use the QOM parent to get the the PHB instead of storing it
> >explicitly?
> 
> 
> I am pretty sure I am not allowed to use the QOM parent, this is why there
> is no object_get_parent() helper.

Hmm.. I thought I had this discussion before and accessing the qom
parent from qmp was bad, but it was ok for internal code use.  But I
may be getting muddled with older qdev stuff.

> 
> >
> >>+}
> >>+
> >>  static int spapr_tce_table_post_load(void *opaque, int version_id)
> >>  {
> >>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> >>@@ -167,6 +174,7 @@ static const VMStateDescription vmstate_spapr_tce_table 
> >>= {
> >>
> >>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
> >>  .translate = spapr_tce_translate_iommu,
> >>+.vfio_notify = spapr_tce_vfio_notify,
> >>  };
> >>
> >>  static int spapr_tce_table_realize(DeviceState *dev)
> >>@@ -235,6 +243,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> >>uint32_t liobn)
> >>
> >>  tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
> >>  tcet->liobn = liobn;
> >>+tcet->owner = owner;
> >>
> >>  snprintf(tmp, sizeof(tmp), "tce-table-%x", liobn);
> >>  object_property_add_child(OBJECT(owner), tmp, OBJECT(tcet), NULL);
> >>diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> >>index ee0fecf..b0cd148 100644
> >>--- a/hw/ppc/spapr_pci.c
> >>+++ b/hw/ppc/spapr_pci.c
> >>@@ -1084,6 +1084,14 @@ static int spapr_populate_pci_child_dt(PCIDevice 
> >>*dev, void *fdt, int offset,
> >>  return 0;
> >>  }
> >>
> >>+int spapr_tce_vfio_notify_owner(DeviceState *dev, sPAPRTCETable *tcet,
> >>+bool attached)
> >>+{
> >>+spapr_tce_set_need_vfio(tcet, attached);
> >
> >Hmm.. you go to the trouble of storing owner in dev, then don't
> >actually use it.
> 
> 
> Yeah, I need to clean this, I removed spapr_tce_vfio_notify_owner() from my
> working branch already and call spapr_tce_set_need_vfio() directly from
> spapr_tce_vfio_notify().

Ok.

> 
> 
> >
> >>+return 0;
> >>+}
> >>+
> >>  /* create OF node for pci device and required OF DT properties */
> >>  static int spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
> >>   void *fdt, int node_offset)
> >>@@ -1118,12 +1126,6 @@ static void 
> >>spapr_phb_add_pci_device(sPAPRDRConnector *drc,
> >>  void *fdt = NULL;
> >>  int fdt_start_offset = 0, fdt_size;
> >>
> >>-if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
> >>-sPAPRTCETable *tcet = spapr_tce_find_by_liobn(phb->dma_liobn);
> >>-
> >>-spapr_tce_set_need_vfio(tcet, true);
> >>-}
> >>-
> >>  if (dev->hotplugged) {
> >>  fdt = create_device_tree(_size);
> >>  fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
> >>diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >>index 9bf4c3b..ca3fd47 100644
> >>--- a/hw/vfio/common.c
> >>+++ b/hw/vfio/common.c
> >>@@ -384,6 +384,7 @@ static void vfio_listener_region_add(MemoryListener 
> >>*listener,
> >>  QLIST_INSERT_HEAD(>giommu_list, giommu,

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 05/16] spapr_iommu: Add root memory region

2016-03-03 Thread David Gibson

On Tue, Mar 01, 2016 at 08:10:30PM +1100, Alexey Kardashevskiy wrote:
> We are going to have multiple DMA windows at different offsets on
> a PCI bus. For the sake of migration, we will have as many TCE table
> objects pre-created as many windows supported.
> So we need a way to map windows dynamically onto a PCI bus
> when migration of a table is completed but at this stage a TCE table
> object does not have access to a PHB to ask it to map a DMA window
> backed by just migrated TCE table.
> 
> This adds a "root" memory region (UINT64_MAX long) to the TCE object.
> This new region is mapped on a PCI bus with enabled overlapping as
> there will be one root MR per TCE table, each of them mapped at 0.
> The actual IOMMU memory region is a subregion of the root region and
> a TCE table enables/disables this subregion and maps it at
> the specific offset inside the root MR which is 1:1 mapping of
> a PCI address space.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: David Gibson 
> Reviewed-by: Thomas Huth 
> ---
>  hw/ppc/spapr_iommu.c   | 13 ++---
>  hw/ppc/spapr_pci.c |  5 +++--
>  include/hw/ppc/spapr.h |  2 +-
>  3 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index e66e128..ba9ddbb 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -172,10 +172,15 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
>  static int spapr_tce_table_realize(DeviceState *dev)
>  {
>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
> +Object *tcetobj = OBJECT(tcet);
> +char tmp[32];
>  
>  tcet->fd = -1;
> -memory_region_init_iommu(>iommu, OBJECT(dev), _iommu_ops,
> - "iommu-spapr", 0);
> +snprintf(tmp, sizeof(tmp), "tce-root-%x", tcet->liobn);
> +memory_region_init(>root, tcetobj, tmp, UINT64_MAX);
> +
> +snprintf(tmp, sizeof(tmp), "tce-iommu-%x", tcet->liobn);
> +memory_region_init_iommu(>iommu, tcetobj, _iommu_ops, tmp, 
> 0);
>  
>  QLIST_INSERT_HEAD(_tce_tables, tcet, list);
>  
> @@ -253,6 +258,7 @@ static void spapr_tce_table_do_enable(sPAPRTCETable *tcet)
>  
>  memory_region_set_size(>iommu,
> (uint64_t)tcet->nb_table << tcet->page_shift);
> +memory_region_add_subregion(>root, tcet->bus_offset, >iommu);
>  
>  tcet->enabled = true;
>  }
> @@ -279,6 +285,7 @@ static void spapr_tce_table_disable(sPAPRTCETable *tcet)
>  return;
>  }
>  
> +memory_region_del_subregion(>root, >iommu);
>  memory_region_set_size(>iommu, 0);
>  
>  spapr_tce_free_table(tcet->table, tcet->fd, tcet->nb_table);
> @@ -302,7 +309,7 @@ static void spapr_tce_table_unrealize(DeviceState *dev, 
> Error **errp)
>  
>  MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet)
>  {
> -return >iommu;
> +return >root;
>  }
>  
>  static void spapr_tce_reset(DeviceState *dev)
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index c34a906..7b40687 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -822,8 +822,6 @@ static int spapr_phb_dma_window_enable(sPAPRPHBState 
> *sphb,
>  
>  spapr_tce_table_enable(tcet, page_shift, window_addr, nb_table, false);
>  
> -memory_region_add_subregion(>iommu_root, tcet->bus_offset,
> -spapr_tce_get_iommu(tcet));
>  return 0;
>  }
>  
> @@ -1411,6 +1409,9 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  return;
>  }
>  
> +memory_region_add_subregion(>iommu_root, 0,
> +spapr_tce_get_iommu(tcet));
> +

Logically this patch should add the _overlap() option rather than a
later one, yes?


>  /* Register default 32bit DMA window */
>  spapr_phb_dma_window_enable(sphb, sphb->dma_liobn,
>  SPAPR_TCE_PAGE_SHIFT,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 3e6bb84..bdf27ec 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -548,7 +548,7 @@ struct sPAPRTCETable {
>  bool bypass;
>  bool need_vfio;
>  int fd;
> -MemoryRegion iommu;
> +MemoryRegion root, iommu;
>  struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only 
> */
>  QLIST_ENTRY(sPAPRTCETable) list;
>  };

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 09/16] vfio: Generalize IOMMU memory listener

2016-03-03 Thread David Gibson

On Thu, Mar 03, 2016 at 05:07:33PM +1100, Alexey Kardashevskiy wrote:
> On 03/03/2016 04:36 PM, David Gibson wrote:
> >On Tue, Mar 01, 2016 at 08:10:34PM +1100, Alexey Kardashevskiy wrote:
> >>At the moment VFIOContainer uses one memory listener which listens on
> >>PCI address space for both Type1 and sPAPR IOMMUs. Soon we will need
> >>another listener to listen on RAM; this will do DMA memory
> >>pre-registration for sPAPR guests which basically pins all guest
> >>pages in the host physical RAM.
> >>
> >>This introduces VFIOMemoryListener which is wrapper for MemoryListener
> >>and stores a pointer to the container. This allows having multiple
> >>memory listeners for the same container. This replaces the existing
> >>@listener with @iommu_listener.
> >>
> >>This should cause no change in behavior.
> >
> >This is nonsense.
> >
> >The two listeners you're talking about have (or should have) both a
> >different AS they're listening on,
> 
> They do have different AS.
> 
> >*and* different notification
> >functions.
> 
> They do use totally different region_add/region_del, later in the series.
> 
> >Since they have nothing in common, there's no point trying
> >to build a common structure for them.
> 
> They use the same VFIOContainer pointer. VFIOMemoryListener is made of
> MemoryListener and VFIOContainer, and that's it.

Right, but you don't need the container pointer.  In both cases you
can locate the VFIOContainer with container_of.  It's a different
container_of invocation in each case, but since they're different
callback functions, that's no problem.

> Ok, I'll get rid of VFIOMemoryListener. It is just hard sometime to
> understand what bits I have to reuse and which I do not, constant
> argument...

I think the arguments to try to make things re-used here were based on
a mis understanding of what the prereg listener was for and therefore
not realizing that it has basically nothing in common with the regular
listener.

> 
> >
> >>
> >>Signed-off-by: Alexey Kardashevskiy 
> >>---
> >>  hw/vfio/common.c  | 41 
> >> +++--
> >>  include/hw/vfio/vfio-common.h |  9 -
> >>  2 files changed, 39 insertions(+), 11 deletions(-)
> >>
> >>diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >>index ca3fd47..0e67a5a 100644
> >>--- a/hw/vfio/common.c
> >>+++ b/hw/vfio/common.c
> >>@@ -318,10 +318,10 @@ static hwaddr 
> >>vfio_container_granularity(VFIOContainer *container)
> >>  return (hwaddr)1 << ctz64(container->iova_pgsizes);
> >>  }
> >>
> >>-static void vfio_listener_region_add(MemoryListener *listener,
> >>+static void vfio_listener_region_add(VFIOMemoryListener *vlistener,
> >>   MemoryRegionSection *section)
> >>  {
> >>-VFIOContainer *container = container_of(listener, VFIOContainer, 
> >>listener);
> >>+VFIOContainer *container = vlistener->container;
> >>  hwaddr iova, end;
> >>  Int128 llend;
> >>  void *vaddr;
> >>@@ -425,10 +425,10 @@ fail:
> >>  }
> >>  }
> >>
> >>-static void vfio_listener_region_del(MemoryListener *listener,
> >>+static void vfio_listener_region_del(VFIOMemoryListener *vlistener,
> >>   MemoryRegionSection *section)
> >>  {
> >>-VFIOContainer *container = container_of(listener, VFIOContainer, 
> >>listener);
> >>+VFIOContainer *container = vlistener->container;
> >>  hwaddr iova, end;
> >>  int ret;
> >>  MemoryRegion *iommu = NULL;
> >>@@ -492,14 +492,33 @@ static void vfio_listener_region_del(MemoryListener 
> >>*listener,
> >>  }
> >>  }
> >>
> >>-static const MemoryListener vfio_memory_listener = {
> >>-.region_add = vfio_listener_region_add,
> >>-.region_del = vfio_listener_region_del,
> >>+static void vfio_iommu_listener_region_add(MemoryListener *listener,
> >>+   MemoryRegionSection *section)
> >>+{
> >>+VFIOMemoryListener *vlistener = container_of(listener, 
> >>VFIOMemoryListener,
> >>+ listener);
> >>+
> >>+vfio_listener_region_add(vlistener, section);
> >>+}
> >>+
> >>+
> >>+static void vfio_iommu_listener_region_del(MemoryListener *listener,
> >>+   MemoryRegionSection *section)
> >>+{
> >>+VFIOMemoryListener *vlistener = container_of(listener, 
> >>VFIOMemoryListener,
> >>+ listener);
> >>+
> >>+vfio_listener_region_del(vlistener, section);
> >>+}
> >>+
> >>+static const MemoryListener vfio_iommu_listener = {
> >>+.region_add = vfio_iommu_listener_region_add,
> >>+.region_del = vfio_iommu_listener_region_del,
> >>  };
> >>
> >>  static void vfio_listener_release(VFIOContainer *container)
> >>  {
> >>-memory_listener_unregister(>listener);
> >>+memory_listener_unregister(>iommu_listener.listener);
> >>  }
> >>
> >>  int vfio_mmap_region(Object *obj, VFIORegion

Re: [Qemu-devel] QCow2 compression

2016-03-03 Thread mgreger

> > I have for example a compressed cluster with an L2 entry value of 4A 
> > C0 00 00 00 3D 97 50. This would lead me to believe the cluster starts 
> > at offset 0x3D9750 and has a length of 0x2B 512-byte sectors (or 0x2B 
> > times 0x200 = 0x5600). Added to the offset this would give an end for 
> > the cluster at offset 0x3DED50. However, it is clear from looking at 
> > the image that the compressed cluster extends further, the data ending 
> > at 0x3DEDD5 and being followed by some zero padding until 0x3DEDF0 
> > where the file ends. How can I know the data extends beyond the length 
> > I calculated? Did I misunderstand the documentation somewhere? Why 
> > does the file end here versus a cluster aligned offset? 
> 
> This zero padding happens in the very last cluster in the image in order 
> to ensure that the image file is aligned to a multiple of the cluster 
> size (qcow2 images are defined to consist of "units of constant size", 
> i.e. only full clusters). 
> 
> The zeros are not part of the compressed data, though, that's why the 
> Compressed Cluster Descriptor indicates a shorter size. Had another 
> compressed cluster been written to the same image, it might have ended 
> up where you are seeing the zero padding now. (The trick with 
> compression is that multiple guest clusters can end up in a single host 
> cluster.) 
> 
 
Thanks, but the given length of 0x5600 is still short by 160(decimal) bytes 
compared to the 
non-zero data (which occupies an additional 133 bytes beyond the expected end 
at 
0x3DED50) and zero 
padding (an additional 27 bytes beyond that). Could there be an off-by-one 
error 
somewhere? 
The file doesn't even end on a sector boundary let alone a cluster boundary. 
 
I can replicate this easily and produce files which demonstrate what I am 
seeing 
here. 
I will try to replicate using a newer version of the qemu-img. The version in 
Debian stable is quite old apparently.

Re: [Qemu-devel] [PATCH qemu] spapr-pci: Make MMIO spacing a machine property and increase it

2016-03-03 Thread Alexey Kardashevskiy


On 03/04/2016 02:39 PM, David Gibson wrote:

On Thu, Mar 03, 2016 at 12:42:53PM +1100, Alexey Kardashevskiy wrote:

The pseries machine supports multiple PHBs. Each PHB's MMIO/IO space is
mapped to the CPU address space starting at SPAPR_PCI_WINDOW_BASE plus
some offset which is calculated from PHB's index and
SPAPR_PCI_WINDOW_SPACING which is defined now as 64GB.

Since the default 32bit DMA window is using first 2GB of MMIO space,
the amount of MMIO which the PCI devices can actually use is reduced
to 62GB. This is a problem if the user wants to use devices with
huge BARs.

For example, 2 PCI functions of a NVIDIA K80 adapter being passed through
will exceed this limit as they have 16M + 16G + 32M BARs which
(when aligned) will need 64GB.

This converts SPAPR_PCI_WINDOW_BASE and SPAPR_PCI_WINDOW_SPACING to
sPAPRMachineState properties. This uses old values for pseries machine
before 2.6 and increases the spacing to 128GB so MMIO space becomes 126GB.

This changes the default value of sPAPRPHBState::mem_win_size to -1 for
pseries-2.6 and adds setup to spapr_phb_realize.

Signed-off-by: Alexey Kardashevskiy 


So, in theory I dislike the spapr_pci device reaching into the machine
type to get the spacing configuration.  But.. I don't know of a better
way to achieve the desired outcome.



We could drop @index and spacing; and request the user to specify the MMIO 
window start (at least) for every additional PHB.





A couple of other details concern me a little more.


---
  hw/ppc/spapr.c  | 43 ++-
  hw/ppc/spapr_pci.c  | 14 ++
  include/hw/pci-host/spapr.h |  4 +---
  include/hw/ppc/spapr.h  |  1 +
  4 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e9d4abf..d21ad8a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -40,6 +40,7 @@
  #include "migration/migration.h"
  #include "mmu-hash64.h"
  #include "qom/cpu.h"
+#include "qapi/visitor.h"

  #include "hw/boards.h"
  #include "hw/ppc/ppc.h"
@@ -2100,6 +2101,29 @@ static void spapr_set_kvm_type(Object *obj, const char 
*value, Error **errp)
  spapr->kvm_type = g_strdup(value);
  }

+static void spapr_prop_get_uint64(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+uint64_t value = *(uint64_t *)opaque;
+visit_type_uint64(v, name, , errp);
+}
+
+static void spapr_prop_set_uint64(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+uint64_t value = -1;
+visit_type_uint64(v, name, , errp);
+*(uint64_t *)opaque = value;
+}


Pity there aren't standard helpers for this.


+static void spapr_prop_add_uint64(Object *obj, const char *name,
+  uint64_t *pval, const char *desc)
+{
+object_property_add(obj, name, "uint64", spapr_prop_get_uint64,
+spapr_prop_set_uint64, NULL, pval, NULL);
+object_property_set_description(obj, name, desc, NULL);
+}
+
  static void spapr_machine_initfn(Object *obj)
  {
  sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
@@ -2110,6 +2134,10 @@ static void spapr_machine_initfn(Object *obj)
  object_property_set_description(obj, "kvm-type",
  "Specifies the KVM virtualization mode (HV, 
PR)",
  NULL);
+spapr_prop_add_uint64(obj, "phb-mmio-base", >phb_mmio_base,
+  "Base address for PCI host bridge MMIO");
+spapr_prop_add_uint64(obj, "phb-mmio-spacing", >phb_mmio_spacing,
+  "Amount of MMIO space per PCI host bridge");


Hmm.. what happens if someone tries to change these propertis at
runtime with qom-set?  That sounds bad.



What is the problem here exactly? These are the properties for new PHBs, 
if/when we add an ability to hotplug PHBs, changes to these properties will 
reflect in new PHB properties.


Likewise writing to "kvm-type" does not switch from HV to PR and vice versa.





  }

  static void spapr_machine_finalizefn(Object *obj)
@@ -2357,6 +2385,10 @@ static const TypeInfo spapr_machine_info = {
   */
  static void spapr_machine_2_6_instance_options(MachineState *machine)
  {
+sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
+
+spapr->phb_mmio_base = SPAPR_PCI_WINDOW_BASE;
+spapr->phb_mmio_spacing = SPAPR_PCI_WINDOW_SPACING;
  }

  static void spapr_machine_2_6_class_options(MachineClass *mc)
@@ -2370,10 +2402,19 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", true);
   * pseries-2.5
   */
  #define SPAPR_COMPAT_2_5 \
-HW_COMPAT_2_5
+HW_COMPAT_2_5 \
+{\
+.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
+.property = "mem_win_size",\
+.value= "0x10",\
+},

  static void spapr_machine_2_5_instance_options(MachineState *machine)
  {
+sPAPRMachineState *spapr =

Re: [Qemu-devel] [PATCH for-2.6] spapr_pci: fix multifunction hotplug

2016-03-03 Thread David Gibson

On Thu, Mar 03, 2016 at 08:50:26PM -0600, Michael Roth wrote:
> Quoting David Gibson (2016-03-03 19:18:09)
> > On Thu, Mar 03, 2016 at 03:55:36PM -0600, Michael Roth wrote:
> > > Since 3f1e147, QEMU has adopted a convention of supporting function
> > > hotplug by deferring hotplug events until func 0 is hotplugged.
> > > This is likely how management tools like libvirt would expose
> > > such support going forward.
> > > 
> > > Since sPAPR guests rely on per-func events rather than
> > > slot-based, our protocol has been to hotplug func 0 *first* to
> > > avoid cases where devices appear within guests without func 0
> > > present to avoid undefined behavior.
> > 
> > Hmm.. I would have thought PAPR guests would be able to cope with a
> > non-zero function device being plugged on its own.
> 
> Well, as far as PAPR goes nothing seems to forbid it, but for
> passthrough devices in particular there seem to be cases where
> drivers (or maybe the actual hardware?) expect function 0 to be
> present. I believe it was with some Broadcom bnx2x adapters where
> we saw some issues.
> 
> Some of it may be due assumptions based around non-passthrough
> usage, but I thought I'd seen some verbage in PCI spec that lent
> some weight to this being a reasonable assumption. There's this
> from PCIe 3.1, 7.5.1.5 at least:
> 
> "Multi-Function Device – When Set, indicates that the Device
> may contain multiple Functions, but not necessarily. Software is
> permitted to probe for Functions other than Function 0. When
> Clear, software must not probe for Functions other than Function
> 0 unless explicitly indicated by another mechanism, such as an
> ARI or SR-IOV Capability structure."
> 
> So for generic PCI rescan (which we still use currently) that could
> be an issue. But yah, I can see why for firmware-configured devices
> that in particular might not be an issue, since we wouldn't need to
> probe in the guest.

Ok.


> 
> > 
> > > 
> > > To remain compatible with new convention, defer hotplug in a
> > > similar manner, but then generate events in 0-first order as we
> > > did in the past. Once func 0 present, fail any attempts to plug
> > > additional functions (as we do with PCIe).
> > > 
> > > For unplug, defer unplug operations in a similar manner, but
> > > generate unplug events such that function 0 is removed last in guest.
> > > 
> > > Signed-off-by: Michael Roth 
> > > ---
> > > Note: I'm not super-certain this is 2.6 material/soft-freeze material,
> > > as the current implementation does "work" if one orders device_adds
> > > in the manner enforced by this patch. The main reason I'm tagging as
> > > 2.6 is to avoid a future compatibility issue if/when libvirt adds support
> > > for multifunction hotplug in the manner suggested by 3f1e147. This does
> > > however guard a bit better against user error.
> > 
> > On balance, I think it is, since it does improve behaviour, rather
> > than add functionality.  I've added it to my ppc-for-2.6 branch.
> 
> Thanks!
> 
> > 
> > > 
> > >  hw/ppc/spapr_pci.c | 93 
> > > ++
> > >  1 file changed, 86 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > index e8edad3..ab6dece 100644
> > > --- a/hw/ppc/spapr_pci.c
> > > +++ b/hw/ppc/spapr_pci.c
> > > @@ -1142,14 +1142,21 @@ static void 
> > > spapr_phb_remove_pci_device(sPAPRDRConnector *drc,
> > >  drck->detach(drc, DEVICE(pdev), spapr_phb_remove_pci_device_cb, phb, 
> > > errp);
> > >  }
> > >  
> > > -static sPAPRDRConnector *spapr_phb_get_pci_drc(sPAPRPHBState *phb,
> > > -   PCIDevice *pdev)
> > > +static sPAPRDRConnector *spapr_phb_get_pci_func_drc(sPAPRPHBState *phb,
> > > +uint32_t busnr,
> > > +int32_t devfn)
> > >  {
> > > -uint32_t busnr = 
> > > pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(pdev;
> > >  return spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI,
> > >  (phb->index << 16) |
> > >  (busnr << 8) |
> > > -pdev->devfn);
> > > +devfn);
> > > +}
> > > +
> > > +static sPAPRDRConnector *spapr_phb_get_pci_drc(sPAPRPHBState *phb,
> > > +   PCIDevice *pdev)
> > > +{
> > > +uint32_t busnr = 
> > > pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(pdev;
> > > +return spapr_phb_get_pci_func_drc(phb, busnr, pdev->devfn);
> > >  }
> > >  
> > >  static uint32_t spapr_phb_get_pci_drc_index(sPAPRPHBState *phb,
> > > @@ -1173,6 +1180,8 @@ static void spapr_phb_hot_plug_child(HotplugHandler 
> > > *plug_handler,
> > >  PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> > >  sPAPRDRConnector *drc = spapr_phb_get_pci_drc(phb, pdev);
> > >

Re: [Qemu-devel] [PATCH qemu] spapr-pci: Make MMIO spacing a machine property and increase it

2016-03-03 Thread David Gibson

On Thu, Mar 03, 2016 at 12:42:53PM +1100, Alexey Kardashevskiy wrote:
> The pseries machine supports multiple PHBs. Each PHB's MMIO/IO space is
> mapped to the CPU address space starting at SPAPR_PCI_WINDOW_BASE plus
> some offset which is calculated from PHB's index and
> SPAPR_PCI_WINDOW_SPACING which is defined now as 64GB.
> 
> Since the default 32bit DMA window is using first 2GB of MMIO space,
> the amount of MMIO which the PCI devices can actually use is reduced
> to 62GB. This is a problem if the user wants to use devices with
> huge BARs.
> 
> For example, 2 PCI functions of a NVIDIA K80 adapter being passed through
> will exceed this limit as they have 16M + 16G + 32M BARs which
> (when aligned) will need 64GB.
> 
> This converts SPAPR_PCI_WINDOW_BASE and SPAPR_PCI_WINDOW_SPACING to
> sPAPRMachineState properties. This uses old values for pseries machine
> before 2.6 and increases the spacing to 128GB so MMIO space becomes 126GB.
> 
> This changes the default value of sPAPRPHBState::mem_win_size to -1 for
> pseries-2.6 and adds setup to spapr_phb_realize.
> 
> Signed-off-by: Alexey Kardashevskiy 

So, in theory I dislike the spapr_pci device reaching into the machine
type to get the spacing configuration.  But.. I don't know of a better
way to achieve the desired outcome.

A couple of other details concern me a little more.

> ---
>  hw/ppc/spapr.c  | 43 ++-
>  hw/ppc/spapr_pci.c  | 14 ++
>  include/hw/pci-host/spapr.h |  4 +---
>  include/hw/ppc/spapr.h  |  1 +
>  4 files changed, 54 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e9d4abf..d21ad8a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -40,6 +40,7 @@
>  #include "migration/migration.h"
>  #include "mmu-hash64.h"
>  #include "qom/cpu.h"
> +#include "qapi/visitor.h"
>  
>  #include "hw/boards.h"
>  #include "hw/ppc/ppc.h"
> @@ -2100,6 +2101,29 @@ static void spapr_set_kvm_type(Object *obj, const char 
> *value, Error **errp)
>  spapr->kvm_type = g_strdup(value);
>  }
>  
> +static void spapr_prop_get_uint64(Object *obj, Visitor *v, const char *name,
> +  void *opaque, Error **errp)
> +{
> +uint64_t value = *(uint64_t *)opaque;
> +visit_type_uint64(v, name, , errp);
> +}
> +
> +static void spapr_prop_set_uint64(Object *obj, Visitor *v, const char *name,
> +  void *opaque, Error **errp)
> +{
> +uint64_t value = -1;
> +visit_type_uint64(v, name, , errp);
> +*(uint64_t *)opaque = value;
> +}

Pity there aren't standard helpers for this.

> +static void spapr_prop_add_uint64(Object *obj, const char *name,
> +  uint64_t *pval, const char *desc)
> +{
> +object_property_add(obj, name, "uint64", spapr_prop_get_uint64,
> +spapr_prop_set_uint64, NULL, pval, NULL);
> +object_property_set_description(obj, name, desc, NULL);
> +}
> +
>  static void spapr_machine_initfn(Object *obj)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> @@ -2110,6 +2134,10 @@ static void spapr_machine_initfn(Object *obj)
>  object_property_set_description(obj, "kvm-type",
>  "Specifies the KVM virtualization mode 
> (HV, PR)",
>  NULL);
> +spapr_prop_add_uint64(obj, "phb-mmio-base", >phb_mmio_base,
> +  "Base address for PCI host bridge MMIO");
> +spapr_prop_add_uint64(obj, "phb-mmio-spacing", >phb_mmio_spacing,
> +  "Amount of MMIO space per PCI host bridge");

Hmm.. what happens if someone tries to change these propertis at
runtime with qom-set?  That sounds bad.

>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> @@ -2357,6 +2385,10 @@ static const TypeInfo spapr_machine_info = {
>   */
>  static void spapr_machine_2_6_instance_options(MachineState *machine)
>  {
> +sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> +
> +spapr->phb_mmio_base = SPAPR_PCI_WINDOW_BASE;
> +spapr->phb_mmio_spacing = SPAPR_PCI_WINDOW_SPACING;
>  }
>  
>  static void spapr_machine_2_6_class_options(MachineClass *mc)
> @@ -2370,10 +2402,19 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", true);
>   * pseries-2.5
>   */
>  #define SPAPR_COMPAT_2_5 \
> -HW_COMPAT_2_5
> +HW_COMPAT_2_5 \
> +{\
> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
> +.property = "mem_win_size",\
> +.value= "0x10",\
> +},
>  
>  static void spapr_machine_2_5_instance_options(MachineState *machine)
>  {
> +sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> +
> +spapr->phb_mmio_base = 0x100ULL;
> +spapr->phb_mmio_spacing = 0x10ULL;
>  }
>  
>  static void spapr_machine_2_5_class_options(MachineClass *mc)
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index

Re: [Qemu-devel] [PATCH v2 3/3] arm: implement query-gic-capability

2016-03-03 Thread Peter Xu

On Thu, Mar 03, 2016 at 01:19:47PM +0100, Andrea Bolognani wrote:
> On Thu, 2016-03-03 at 16:21 +0800, Peter Xu wrote:
> > For emulated ARM VM, only gicv2 is supported. We need to add gicv3
> in
> > when emulated gicv3 ready. For KVM accelerated ARM VM, we detect the
> > capability bits using ioctls.
> > 
> > if we want to
> know GIC kernel capabilities, we need to make sure we have
> > enabled KVM when querying (like, with "-enable-kvm").
> > 
> >
> Signed-off-by: Peter Xu 
> > ---
> >  target-arm/machine.c | 48
> +++-
> >  1 file changed, 47 insertions(+), 1 deletion(-)
> 
> Sorry for not catching this earlier, but I'm afraid this is not
> going to work -- libvirt doesn't pass either -enable-kvm or the
> machine option accel=kvm when probing for capabilities, which
> means that, with the current implementation, it will only get
> information about emulated GIC.
> 
> Is there a way to make probing work without requiring KVM to
> be enabled?

Ah.. If so, this is a good point...

I can do this, but I just feel it a bit hacky if I do ioctl()s
directly in one QMP command handle:

qmp_query_gic_capability()
{
kvm = open("/dev/kvm");
vm = ioctl(KVM_CREATE_VM);

...test create devices using KVM_CREATE_DEVICE ioctls...

close(vm);
close(kvm);
}

Rather than leveraging current KVMState stuffs (of course, I can
make things a little bit prettier than above...).

Another way to do is to generalize kvm_init() maybe? That's some
work too.

Andrea, do you know how much effort we need to add this support for
libvirt, say, we can specify "accel=" or "-enable-kvm" as extra
parameter when probing?

Or, does anyone on the list has suggestion on how to better do this?

Thanks.
Peter

Re: [Qemu-devel] [PATCH for-2.6] spapr_pci: fix multifunction hotplug

2016-03-03 Thread Michael Roth

Quoting David Gibson (2016-03-03 19:18:09)
> On Thu, Mar 03, 2016 at 03:55:36PM -0600, Michael Roth wrote:
> > Since 3f1e147, QEMU has adopted a convention of supporting function
> > hotplug by deferring hotplug events until func 0 is hotplugged.
> > This is likely how management tools like libvirt would expose
> > such support going forward.
> > 
> > Since sPAPR guests rely on per-func events rather than
> > slot-based, our protocol has been to hotplug func 0 *first* to
> > avoid cases where devices appear within guests without func 0
> > present to avoid undefined behavior.
> 
> Hmm.. I would have thought PAPR guests would be able to cope with a
> non-zero function device being plugged on its own.

Well, as far as PAPR goes nothing seems to forbid it, but for
passthrough devices in particular there seem to be cases where
drivers (or maybe the actual hardware?) expect function 0 to be
present. I believe it was with some Broadcom bnx2x adapters where
we saw some issues.

Some of it may be due assumptions based around non-passthrough
usage, but I thought I'd seen some verbage in PCI spec that lent
some weight to this being a reasonable assumption. There's this
from PCIe 3.1, 7.5.1.5 at least:

"Multi-Function Device – When Set, indicates that the Device
may contain multiple Functions, but not necessarily. Software is
permitted to probe for Functions other than Function 0. When
Clear, software must not probe for Functions other than Function
0 unless explicitly indicated by another mechanism, such as an
ARI or SR-IOV Capability structure."

So for generic PCI rescan (which we still use currently) that could
be an issue. But yah, I can see why for firmware-configured devices
that in particular might not be an issue, since we wouldn't need to
probe in the guest.

> 
> > 
> > To remain compatible with new convention, defer hotplug in a
> > similar manner, but then generate events in 0-first order as we
> > did in the past. Once func 0 present, fail any attempts to plug
> > additional functions (as we do with PCIe).
> > 
> > For unplug, defer unplug operations in a similar manner, but
> > generate unplug events such that function 0 is removed last in guest.
> > 
> > Signed-off-by: Michael Roth 
> > ---
> > Note: I'm not super-certain this is 2.6 material/soft-freeze material,
> > as the current implementation does "work" if one orders device_adds
> > in the manner enforced by this patch. The main reason I'm tagging as
> > 2.6 is to avoid a future compatibility issue if/when libvirt adds support
> > for multifunction hotplug in the manner suggested by 3f1e147. This does
> > however guard a bit better against user error.
> 
> On balance, I think it is, since it does improve behaviour, rather
> than add functionality.  I've added it to my ppc-for-2.6 branch.

Thanks!

> 
> > 
> >  hw/ppc/spapr_pci.c | 93 
> > ++
> >  1 file changed, 86 insertions(+), 7 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index e8edad3..ab6dece 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1142,14 +1142,21 @@ static void 
> > spapr_phb_remove_pci_device(sPAPRDRConnector *drc,
> >  drck->detach(drc, DEVICE(pdev), spapr_phb_remove_pci_device_cb, phb, 
> > errp);
> >  }
> >  
> > -static sPAPRDRConnector *spapr_phb_get_pci_drc(sPAPRPHBState *phb,
> > -   PCIDevice *pdev)
> > +static sPAPRDRConnector *spapr_phb_get_pci_func_drc(sPAPRPHBState *phb,
> > +uint32_t busnr,
> > +int32_t devfn)
> >  {
> > -uint32_t busnr = 
> > pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(pdev;
> >  return spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI,
> >  (phb->index << 16) |
> >  (busnr << 8) |
> > -pdev->devfn);
> > +devfn);
> > +}
> > +
> > +static sPAPRDRConnector *spapr_phb_get_pci_drc(sPAPRPHBState *phb,
> > +   PCIDevice *pdev)
> > +{
> > +uint32_t busnr = 
> > pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(pdev;
> > +return spapr_phb_get_pci_func_drc(phb, busnr, pdev->devfn);
> >  }
> >  
> >  static uint32_t spapr_phb_get_pci_drc_index(sPAPRPHBState *phb,
> > @@ -1173,6 +1180,8 @@ static void spapr_phb_hot_plug_child(HotplugHandler 
> > *plug_handler,
> >  PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> >  sPAPRDRConnector *drc = spapr_phb_get_pci_drc(phb, pdev);
> >  Error *local_err = NULL;
> > +PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev)));
> > +uint32_t slotnr = PCI_SLOT(pdev->devfn);
> >  
> >  /* if DR is disabled we don't need to do anything in the case of
> >   * hotplug or coldplug callbacks
> > @@ -1190,13

Re: [Qemu-devel] [RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage

2016-03-03 Thread Li, Liang Z

> On Thu, Mar 03, 2016 at 06:44:28PM +0800, Liang Li wrote:
> > Get the free pages information through virtio and filter out the free
> > pages in the ram bulk stage. This can significantly reduce the total
> > live migration time as well as network traffic.
> >
> > Signed-off-by: Liang Li 
> > ---
> >  migration/ram.c | 52
> > ++--
> >  1 file changed, 46 insertions(+), 6 deletions(-)
> 
> > @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void
> *opaque)
> >  DIRTY_MEMORY_MIGRATION);
> >  }
> >  memory_global_dirty_log_start();
> > +
> > +if (balloon_free_pages_support() &&
> > +balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> > +   _pages_count) == 0) {
> > +qemu_mutex_unlock_iothread();
> > +while (balloon_get_free_pages(migration_bitmap_rcu-
> >free_pages_bmap,
> > +  _pages_count) == 0) {
> > +usleep(1000);
> > +}
> > +qemu_mutex_lock_iothread();
> > +
> > +filter_out_guest_free_pages(migration_bitmap_rcu-
> >free_pages_bmap);
> > +}
> 
> IIUC, this code is synchronous wrt to the guest OS balloon drive. ie it is 
> asking
> the geust for free pages and waiting for a response. If the guest OS has
> crashed this is going to mean QEMU waits forever and thus migration won't
> complete. Similarly you need to consider that the guest OS may be malicious
> and simply never respond.
> 
> So if the migration code is going to use the guest balloon driver to get info
> about free pages it has to be done in an asynchronous manner so that
> migration can never be stalled by a slow/crashed/malicious guest driver.
> 
> Regards,
> Daniel

Really,  thanks a lot!

Liang

Re: [Qemu-devel] [Qemu-ppc] [PATCH] target-ppc: fix sync of SPR_SDR1 with KVM

2016-03-03 Thread David Gibson

On Fri, Mar 04, 2016 at 12:45:29AM +0100, Greg Kurz wrote:
> On Thu, 3 Mar 2016 15:35:07 +1100
> David Gibson  wrote:
> 
> > On Wed, Mar 02, 2016 at 11:06:19AM +1100, David Gibson wrote:
> > > On Tue, Mar 01, 2016 at 07:03:10PM +0100, Greg Kurz wrote:  
> > > > The gdbstub can't access guest memory with current master. This is what 
> > > > you
> > > > get in gdb:
> > > > 
> > > > 0x19b8 in main (argc= > > > memory
> > > > at address 0x3fffce4d3620>, argv= > > > memory
> > > > at address 0x3fffce4d3628>) at fp.c:11
> > > > 
> > > > Bisect leads to the following commit:
> > > > 
> > > > commit fa48b4328c39b2532e47efcfcba6d4031512f514
> > > > Author: David Gibson 
> > > > Date:   Tue Feb 9 09:30:21 2016 +1000
> > > > 
> > > > target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM
> > > > 
> > > > Looking at the env->external_htab users, I've spotted a behaviour 
> > > > change in
> > > > kvm_arch_get_registers(), which now always calls ppc_store_sdr1().
> > > > 
> > > > Checking kvmppc_kern_htab, like it is done in the MMU helpers, fixes the
> > > > issue.
> > > > 
> > > > Signed-off-by: Greg Kurz   
> > > 
> > > Mea culpa.  Good catch, applied to ppc-for-2.6, thanks.  
> > 
> > Ah.. wait.. this patch breaks compile for the ppc32 target.  Can you
> > fix this please.
> > 
> 
> Oops... I'm on vacation this week. Not sure I can find time before
> next monday... :\

Ok.  I've had a closer look and realized that the earlier commit
(fa48b43) was basically a bad idea.  I'll shortly post something to
accomplish its aims in a different and better way.

> 
> > > > ---
> > > >  target-ppc/kvm.c |2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> > > > index d67c169ba324..dbc37f25af2b 100644
> > > > --- a/target-ppc/kvm.c
> > > > +++ b/target-ppc/kvm.c
> > > > @@ -1190,7 +1190,7 @@ int kvm_arch_get_registers(CPUState *cs)
> > > >  return ret;
> > > >  }
> > > >  
> > > > -if (!env->external_htab) {
> > > > +if (!kvmppc_kern_htab && !env->external_htab) {
> > > >  ppc_store_sdr1(env, sregs.u.s.sdr1);
> > > >  }
> > > >  
> > > >   
> > >   
> > 
> > 
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH] target-ppc/pseries: Clean up handling of KVM managed external HPTs

2016-03-03 Thread David Gibson

fa48b43 "target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM"
purports to remove a hack in the handling of hash page tables (HPTs)
managed by KVM instead of qemu.  However, it makes the wrong call.

That patch requires anything looking for an external HPT (that is one not
managed by the guest itself) to check both env->external_htab (for a qemu
managed HPT) and kvmppc_kern_htab (for a KVM managed HPT).  That's a
problem because kvmppc_kern_htab is local to mmu-hash64.c, but some places
which need to check for an external HPT are outside that, such as
kvm_arch_get_registers().  The latter was subtly broken by the earlier
patch such that gdbstub can no longer access memory.

Basically a KVM managed HPT is much more like a qemu managed HPT than it is
like a guest managed HPT, so the original "hack" was actually on the right
track.

This partially reverts fa48b43, marking a KVM managed external HPT by
putting a special but non-NULL value in env->external_htab.  It then goes
further, using that marker to eliminate the kvmppc_kern_htab global
entirely, and adding a ppc_hash64_set_external_hpt() helper function to
reduce the amount of intimate knowledge of the cpu code that the pseries
machine type needs to set this up correctly.

This also has some flow-on changes to the HPT access helpers, required by
the above changes.

Reported-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c  |  6 ++
 hw/ppc/spapr_hcall.c| 10 +-
 target-ppc/mmu-hash64.c | 46 +-
 target-ppc/mmu-hash64.h |  9 -
 4 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e9d4abf..d8b749c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1091,7 +1091,7 @@ static void spapr_reallocate_hpt(sPAPRMachineState 
*spapr, int shift,
 }
 
 spapr->htab_shift = shift;
-kvmppc_kern_htab = true;
+spapr->htab = NULL;
 } else {
 /* kernel-side HPT not needed, allocate in userspace instead */
 size_t size = 1ULL << shift;
@@ -1106,7 +1106,6 @@ static void spapr_reallocate_hpt(sPAPRMachineState 
*spapr, int shift,
 
 memset(spapr->htab, 0, size);
 spapr->htab_shift = shift;
-kvmppc_kern_htab = false;
 
 for (i = 0; i < size / HASH_PTE_SIZE_64; i++) {
 DIRTY_HPTE(HPTE(spapr->htab, i));
@@ -1196,8 +1195,7 @@ static void spapr_cpu_reset(void *opaque)
 
 env->spr[SPR_HIOR] = 0;
 
-env->external_htab = (uint8_t *)spapr->htab;
-env->htab_base = -1;
+ppc_hash64_set_external_hpt(cpu, spapr->htab);
 /*
  * htab_mask is the mask used to normalize hash value to PTEG index.
  * htab_shift is log2 of hash table size.
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 1733482..b2b1b93 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -122,17 +122,17 @@ static target_ulong h_enter(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 break;
 }
 }
-ppc_hash64_stop_access(token);
+ppc_hash64_stop_access(cpu, token);
 if (index == 8) {
 return H_PTEG_FULL;
 }
 } else {
 token = ppc_hash64_start_access(cpu, pte_index);
 if (ppc_hash64_load_hpte0(cpu, token, 0) & HPTE64_V_VALID) {
-ppc_hash64_stop_access(token);
+ppc_hash64_stop_access(cpu, token);
 return H_PTEG_FULL;
 }
-ppc_hash64_stop_access(token);
+ppc_hash64_stop_access(cpu, token);
 }
 
 ppc_hash64_store_hpte(cpu, pte_index + index,
@@ -165,7 +165,7 @@ static RemoveResult remove_hpte(PowerPCCPU *cpu, 
target_ulong ptex,
 token = ppc_hash64_start_access(cpu, ptex);
 v = ppc_hash64_load_hpte0(cpu, token, 0);
 r = ppc_hash64_load_hpte1(cpu, token, 0);
-ppc_hash64_stop_access(token);
+ppc_hash64_stop_access(cpu, token);
 
 if ((v & HPTE64_V_VALID) == 0 ||
 ((flags & H_AVPN) && (v & ~0x7fULL) != avpn) ||
@@ -288,7 +288,7 @@ static target_ulong h_protect(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 token = ppc_hash64_start_access(cpu, pte_index);
 v = ppc_hash64_load_hpte0(cpu, token, 0);
 r = ppc_hash64_load_hpte1(cpu, token, 0);
-ppc_hash64_stop_access(token);
+ppc_hash64_stop_access(cpu, token);
 
 if ((v & HPTE64_V_VALID) == 0 ||
 ((flags & H_AVPN) && (v & ~0x7fULL) != avpn)) {
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 9c58fbf..88d4296 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -36,10 +36,11 @@
 #endif
 
 /*
- * Used to indicate whether we have allocated htab in the
- * host kernel
+ * Used to indicate that a CPU has it's hash page table (HPT) managed
+ * within the host kernel
  */
-bool kvmppc_kern_htab;
+#define MMU_HASH64_KVM_MANAGED_HPT  ((void *)-1)
+
 /*
  * SLB handling
  */
@@ -259,6 +260,18

Re: [Qemu-devel] [RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device

2016-03-03 Thread Li, Liang Z

> On Thu,  3 Mar 2016 18:44:26 +0800
> Liang Li  wrote:
> 
> > Extend the virtio balloon device to support a new feature, this new
> > feature can help to get guest's free pages information, which can be
> > used for live migration optimzation.
> 
> Do you have a spec for this, e.g. as a patch to the virtio spec?

Not yet.
> 
> >
> > Signed-off-by: Liang Li 
> > ---
> >  balloon.c   | 30 -
> >  hw/virtio/virtio-balloon.c  | 81 
> > -
> >  include/hw/virtio/virtio-balloon.h  | 17 +-
> >  include/standard-headers/linux/virtio_balloon.h |  1 +
> >  include/sysemu/balloon.h| 10 ++-
> >  5 files changed, 134 insertions(+), 5 deletions(-)
> 
> > +static int virtio_balloon_free_pages(void *opaque,
> > + unsigned long *free_pages_bitmap,
> > + unsigned long *free_pages_count)
> > +{
> > +VirtIOBalloon *s = opaque;
> > +VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > +VirtQueueElement *elem = s->free_pages_vq_elem;
> > +int len;
> > +
> > +if (!balloon_free_pages_supported(s)) {
> > +return -1;
> > +}
> > +
> > +if (s->req_status == NOT_STARTED) {
> > +s->free_pages_bitmap = free_pages_bitmap;
> > +s->req_status = STARTED;
> > +s->mem_layout.low_mem =
> > + pc_get_lowmem(PC_MACHINE(current_machine));
> 
> Please don't leak pc-specific information into generic code.

I have already notice that and just leave it here in this  initial RFC version, 
 
the hard part of this solution is how to handle different architecture ...

Thanks!

Liang

Re: [Qemu-devel] [RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage

2016-03-03 Thread Li, Liang Z

> On Thu,  3 Mar 2016 18:44:28 +0800
> Liang Li  wrote:
> 
> > Get the free pages information through virtio and filter out the free
> > pages in the ram bulk stage. This can significantly reduce the total
> > live migration time as well as network traffic.
> >
> > Signed-off-by: Liang Li 
> > ---
> >  migration/ram.c | 52
> > ++--
> >  1 file changed, 46 insertions(+), 6 deletions(-)
> >
> 
> > @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void
> *opaque)
> >  DIRTY_MEMORY_MIGRATION);
> >  }
> >  memory_global_dirty_log_start();
> > +
> > +if (balloon_free_pages_support() &&
> > +balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> > +   _pages_count) == 0) {
> > +qemu_mutex_unlock_iothread();
> > +while (balloon_get_free_pages(migration_bitmap_rcu-
> >free_pages_bmap,
> > +  _pages_count) == 0) {
> > +usleep(1000);
> > +}
> > +qemu_mutex_lock_iothread();
> > +
> > +
> > + filter_out_guest_free_pages(migration_bitmap_rcu-
> >free_pages_bmap);
> 
> A general comment: Using the ballooner to get information about pages that
> can be filtered out is too limited (there may be other ways to do this; we
> might be able to use cmma on s390, for example), and I don't like hardcoding
> to a specific method.
> 
> What about the reverse approach: Code may register a handler that
> populates the free_pages_bitmap which is called during this stage?

Good suggestion, thanks!

Liang
>  yet>
>

Re: [Qemu-devel] [RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device

2016-03-03 Thread Li, Liang Z

> Subject: Re: [RFC qemu 2/4] virtio-balloon: Add a new feature to balloon
> device
> 
> On Thu, Mar 03, 2016 at 06:44:26PM +0800, Liang Li wrote:
> > Extend the virtio balloon device to support a new feature, this new
> > feature can help to get guest's free pages information, which can be
> > used for live migration optimzation.
> >
> > Signed-off-by: Liang Li 
> 
> I don't understand why we need a new interface.
> Balloon already sends free pages to host.
> Just teach host to skip these pages.
> 

I just make use the current virtio-balloon implementation,  it's more 
complicated to
invent a new virtio-io device...
Actually, there is no need to inflate the balloon before live migration, so the 
host has
no information about the guest's free pages, that's why I add a new one.

> Maybe instead of starting with code, you should send a high level description
> to the virtio tc for consideration?
> 
> You can do it through the mailing list or using the web form:
> http://www.oasis-
> open.org/committees/comments/form.php?wg_abbrev=virtio
> 

Thanks for your information and suggestion.

Liang

Re: [Qemu-devel] [PATCH v2 2/3] arm: qmp: add query-gic-capability interface

2016-03-03 Thread Peter Xu

On Thu, Mar 03, 2016 at 12:55:51PM +0100, Andrew Jones wrote:
> On Thu, Mar 03, 2016 at 04:21:11PM +0800, Peter Xu wrote:
> > +
> > +GICCapabilityList *qmp_query_gic_capability(Error **errp);
> 
> I don't know anything about QMP, so just offering a superficial
> review comment. Is the prototype necessary here? It seems redundant,
> considering the function is defined right below.
> 
> drew

I added this to avoid a "missing prototype" warning. However found
the correct way to do it is possibly to include "qmp-commands.h" in
target-arm/machine.c. Thanks to point out! Will fix.

Peter

Re: [Qemu-devel] [PATCH for-2.6] spapr_pci: fix multifunction hotplug

2016-03-03 Thread David Gibson

On Thu, Mar 03, 2016 at 03:55:36PM -0600, Michael Roth wrote:
> Since 3f1e147, QEMU has adopted a convention of supporting function
> hotplug by deferring hotplug events until func 0 is hotplugged.
> This is likely how management tools like libvirt would expose
> such support going forward.
> 
> Since sPAPR guests rely on per-func events rather than
> slot-based, our protocol has been to hotplug func 0 *first* to
> avoid cases where devices appear within guests without func 0
> present to avoid undefined behavior.

Hmm.. I would have thought PAPR guests would be able to cope with a
non-zero function device being plugged on its own.

> 
> To remain compatible with new convention, defer hotplug in a
> similar manner, but then generate events in 0-first order as we
> did in the past. Once func 0 present, fail any attempts to plug
> additional functions (as we do with PCIe).
> 
> For unplug, defer unplug operations in a similar manner, but
> generate unplug events such that function 0 is removed last in guest.
> 
> Signed-off-by: Michael Roth 
> ---
> Note: I'm not super-certain this is 2.6 material/soft-freeze material,
> as the current implementation does "work" if one orders device_adds
> in the manner enforced by this patch. The main reason I'm tagging as
> 2.6 is to avoid a future compatibility issue if/when libvirt adds support
> for multifunction hotplug in the manner suggested by 3f1e147. This does
> however guard a bit better against user error.

On balance, I think it is, since it does improve behaviour, rather
than add functionality.  I've added it to my ppc-for-2.6 branch.

> 
>  hw/ppc/spapr_pci.c | 93 
> ++
>  1 file changed, 86 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index e8edad3..ab6dece 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1142,14 +1142,21 @@ static void 
> spapr_phb_remove_pci_device(sPAPRDRConnector *drc,
>  drck->detach(drc, DEVICE(pdev), spapr_phb_remove_pci_device_cb, phb, 
> errp);
>  }
>  
> -static sPAPRDRConnector *spapr_phb_get_pci_drc(sPAPRPHBState *phb,
> -   PCIDevice *pdev)
> +static sPAPRDRConnector *spapr_phb_get_pci_func_drc(sPAPRPHBState *phb,
> +uint32_t busnr,
> +int32_t devfn)
>  {
> -uint32_t busnr = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(pdev;
>  return spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI,
>  (phb->index << 16) |
>  (busnr << 8) |
> -pdev->devfn);
> +devfn);
> +}
> +
> +static sPAPRDRConnector *spapr_phb_get_pci_drc(sPAPRPHBState *phb,
> +   PCIDevice *pdev)
> +{
> +uint32_t busnr = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(pdev;
> +return spapr_phb_get_pci_func_drc(phb, busnr, pdev->devfn);
>  }
>  
>  static uint32_t spapr_phb_get_pci_drc_index(sPAPRPHBState *phb,
> @@ -1173,6 +1180,8 @@ static void spapr_phb_hot_plug_child(HotplugHandler 
> *plug_handler,
>  PCIDevice *pdev = PCI_DEVICE(plugged_dev);
>  sPAPRDRConnector *drc = spapr_phb_get_pci_drc(phb, pdev);
>  Error *local_err = NULL;
> +PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev)));
> +uint32_t slotnr = PCI_SLOT(pdev->devfn);
>  
>  /* if DR is disabled we don't need to do anything in the case of
>   * hotplug or coldplug callbacks
> @@ -1190,13 +1199,44 @@ static void spapr_phb_hot_plug_child(HotplugHandler 
> *plug_handler,
>  
>  g_assert(drc);
>  
> +/* Following the QEMU convention used for PCIe multifunction
> + * hotplug, we do not allow functions to be hotplugged to a
> + * slot that already has function 0 present
> + */
> +if (plugged_dev->hotplugged && bus->devices[PCI_DEVFN(slotnr, 0)] &&
> +PCI_FUNC(pdev->devfn) != 0) {
> +error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
> +   " additional functions can no longer be exposed to 
> guest.",
> +   slotnr, bus->devices[PCI_DEVFN(slotnr, 0)]->name);
> +return;
> +}
> +
>  spapr_phb_add_pci_device(drc, phb, pdev, _err);
>  if (local_err) {
>  error_propagate(errp, local_err);
>  return;
>  }
> -if (plugged_dev->hotplugged) {
> -spapr_hotplug_req_add_by_index(drc);
> +
> +/* If this is function 0, signal hotplug for all the device functions.
> + * Otherwise defer sending the hotplug event.
> + */
> +if (plugged_dev->hotplugged && PCI_FUNC(pdev->devfn) == 0) {
> +int i;
> +
> +for (i = 0; i < 8; i++) {
> +sPAPRDRConnector *func_drc;
> +sPAPRDRConnectorClass *func_drck;
> +

Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-03 Thread Li, Liang Z

> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> * Liang Li (liang.z...@intel.com) wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> 
> Hi,
>   An interesting solution; I know a few different people have been looking at
> how to speed up ballooned VM migration.
> 

Ooh, different solutions for the same purpose, and both based on the balloon.

>   I wonder if it would be possible to avoid the kernel changes by parsing
> /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> pages in the guest ram, would it achieve the same result?
> 

Only detect the unmapped/zero mapped pages is not enough. Consider the 
situation like case 2, it can't achieve the same result.

> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> For postcopy to be safe, you would still need to send a message to the
> destination telling it that there were zero pages, otherwise the destination
> can't tell if it's supposed to request the page from the source or treat the
> page as zero.
> 
> Dave

I will consider this later, thanks, Dave.

Liang

> 
> >
> > Performance data
> > 
> >
> > Test environment:
> >
> > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > Host Linux Kernel:  4.2.0   Host OS: CentOS 7.1
> > Guest Linux Kernel:  4.5.rc6Guest OS: CentOS 6.6
> > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> >
> > Case 1: Idle guest just boots:
> > 
> > | original  |pv
> > ---
> > total time(ms)  |1894   |   421
> > 
> > transferred ram(KB) |   398017  |  353242
> > 
> >
> >
> > Case 2: The guest has ever run some memory consuming workload, the
> > workload is terminated just before live migration.
> > 
> > | original  |pv
> > ---
> > total time(ms)  |   7436|   552
> > 
> > transferred ram(KB) |  8146291  |  361375
> > 
> >

Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-03 Thread Li, Liang Z

> On Thu, Mar 03, 2016 at 06:44:24PM +0800, Liang Li wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> >
> > Performance data
> > 
> >
> > Test environment:
> >
> > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > Host Linux Kernel:  4.2.0   Host OS: CentOS 7.1
> > Guest Linux Kernel:  4.5.rc6Guest OS: CentOS 6.6
> > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> >
> > Case 1: Idle guest just boots:
> > 
> > | original  |pv
> > ---
> > total time(ms)  |1894   |   421
> > 
> > transferred ram(KB) |   398017  |  353242
> > 
> >
> >
> > Case 2: The guest has ever run some memory consuming workload, the
> > workload is terminated just before live migration.
> > 
> > | original  |pv
> > ---
> > total time(ms)  |   7436|   552
> > 
> > transferred ram(KB) |  8146291  |  361375
> > 
> 
> Both cases look very artificial to me.  Normally you migrate VMs which have
> started long ago and which can't have their services terminated before the
> migration, so I wouldn't expect any useful amount of free pages obtained
> this way.
> 

Yes, it's somewhat artificial, just to emphasize the effect.  And I think these 
two
cases are very easy to reproduce. Using the real workload and do the test
in production environment will be more convince.

We can predict that as long as the guest doesn't use out of its memory, this 
solution
may still take affect and shorten the total live migration time. (Off cause, we 
should
consider the time cost of the virtio communication.)

> OTOH I don't see why you can't just inflate the balloon before the migration,
> and really optimize the amount of transferred data this way?
> With the recently proposed VIRTIO_BALLOON_S_AVAIL you can have a fairly
> good estimate of the optimal balloon size, and with the recently merged
> balloon deflation on OOM it's a safe thing to do without exposing the guest
> workloads to OOM risks.
> 
> Roman.

Thanks for your information.  The size of the free page bitmap is not very 
large, for a
guest with 8GB RAM, only 256KB  extra memory is required.
Comparing to this solution, inflate the balloon is more expensive. If the 
balloon size
is not so optimal and guest request more memory during live migration, the 
guest's
performance will be impacted.

Liang

Re: [Qemu-devel] [PATCH v2 00/13] Introduce Intel 82574 GbE Controller Emulation (e1000e)

2016-03-03 Thread Jason Wang



On 03/03/2016 06:02 PM, Leonid Bloch wrote:
> Greetings Qemu-Devel,
>
> I am wondering if any of you have further comments on the series in issue.
>
> Links to individual patches are attached, for convenience.
>
> Kind regards,
> Leonid.

Hello Leonid:

I've begun the reviewing. But consider the series is rather large
(thousands of lines), it still need some time. I will try to give
feedbacks next week.

Thanks

>
> http://patchwork.ozlabs.org/patch/586418
> http://patchwork.ozlabs.org/patch/586422
> http://patchwork.ozlabs.org/patch/586427
> http://patchwork.ozlabs.org/patch/586421
> http://patchwork.ozlabs.org/patch/586420
> http://patchwork.ozlabs.org/patch/586430
> http://patchwork.ozlabs.org/patch/586426
> http://patchwork.ozlabs.org/patch/586431
> http://patchwork.ozlabs.org/patch/586423
> http://patchwork.ozlabs.org/patch/586419
> http://patchwork.ozlabs.org/patch/586429
> http://patchwork.ozlabs.org/patch/586424
> http://patchwork.ozlabs.org/patch/586432
>
> On Mon, Feb 22, 2016 at 7:37 PM, Leonid Bloch
>  wrote:
>> Hello All,
>>
>> This is v2 of the patches, after the initial reviews.
>>
>> For convenience, the same patches are available at:
>> https://github.com/daynix/qemu-e1000e/tree/e1000e-submit-v2
>>
>> Best regards,
>> Leonid.
>>
>> Changes since v1:
>>
>> 1. PCI_PM_CAP_VER_1_1 is defined now in include/hw/pci/pci_regs.h and
>>not in include/standard-headers/linux/pci_regs.h.
>> 2. Changes in naming and extra comments in hw/pci/pcie.c and in
>>include/hw/pci/pcie.h.
>> 3. Defining pci_dsn_ver and pci_dsn_cap static const variables in
>>hw/pci/pcie.c, instead of PCI_DSN_VER and PCI_DSN_CAP symbolic
>>constants in include/hw/pci/pcie_regs.h.
>> 4. Changing the vmxnet3_device_serial_num function in hw/net/vmxnet3.c
>>to avoid the cast when it is called.
>> 5. Avoiding a preceding underscore in all the e1000e-related names.
>> 6. Minor style changes.
>>
>> ===
>>
>> Hello All,
>>
>> This series is the final code of the e1000e device emulation, that we
>> have developed. Please review, and consider acceptance of these patches
>> to the upstream QEMU repository.
>>
>> The code stability was verified by various traffic tests using Fedora 22
>> Linux, and Windows Server 2012R2 guests. Also, Microsoft Hardware
>> Certification Kit (HCK) tests were run on a Windows Server 2012R2 guest.
>>
>> There was a discussion on the possibility of code sharing between the
>> e1000e, and the existing e1000 devices. We have reviewed the final code
>> for parts that may be shared between this device and the currently
>> available e1000 emulation. The device specifications are very different,
>> and there are almost no registers, nor functions, that were left as is
>> from e1000. The ring descriptor structures were changed as well, by the
>> introduction of extended and PS descriptors, as well as additional bits.
>>
>> Additional differences stem from the fact that the e1000e device re-uses
>> network packet abstractions introduced by the vmxnet3 device, while the
>> e1000 has its own code for packet handling. BTW, it may be worth reusing
>> those abstractions in e1000 as well. (Following these changes the
>> vmxnet3 device was successfully tested for possible regressions.)
>>
>> There are a few minor parts that may be shared, e.g. the default
>> register handlers, and the ring management functions. The total amount
>> of shared lines will be about 100--150, so we're not sure if it makes
>> sense bothering, and taking a risk of breaking e1000, which is a good,
>> old, and stable device.
>>
>> Currently, the e1000e code is stand alone w.r.t. e1000.
>>
>> Please share your thoughts.
>>
>> Thanks in advance,
>> Dmitry.
>>
>>
>> Changes since RFCv2:
>>
>> 1. Device functionality verified using Microsoft Hardware Certification Test 
>> Kit (HCK)
>> 2. Introduced a number of performance improvements
>> 3. The code was cleaned, and rebased to the latest master
>> 4. Patches verified with checkpatch.pl
>>
>> ===
>>
>> Changes since RFCv1:
>>
>> 1. Added support for all the device features:
>>   - Interrupt moderation.
>>   - RSS.
>>   - Multiqueue.
>> 2. Simulated exact PCI/PCIe configuration space layout.
>> 3. Made fixes needed to pass Microsoft's HW certification tests (HCK).
>>
>> This series is still an RFC, because the following tasks are not done yet:
>>
>> 1. See which code can be shared between this device and the existing e1000 
>> device.
>> 2. Rebase patches to the latest master (current base is v2.3.0).
>>
>> Please share your thoughts,
>> Thanks, Dmitry.
>>
>> ===
>>
>> Hello qemu-devel,
>>
>> This patch series is an RFC for the new networking device emulation
>> we're developing for QEMU.
>>
>> This new device emulates the Intel 82574 GbE Controller and works
>> with unmodified Intel e1000e drivers from the Linux/Windows kernels.
>>
>> The status of the current series is "Functional Device Ready, work
>> on

Re: [Qemu-devel] [PATCH 72/77] ppc: A couple more dummy POWER8 Book4 regs

2016-03-03 Thread Benjamin Herrenschmidt

On Wed, 2016-03-02 at 21:30 +0100, Thomas Huth wrote:
> So if you've got some spare time, could you maybe extract all those
> patches that define new SPRs with spr_register_kvm[_hv] and send them as
> a separate patch series? That could help to fix future migration issues,
> and also would decrease the size of your really huge "Add native POWER8
> platform" patch series a little bit!

Time is the problem :-) My tree is bitrotting right now, I am completely
caught up with a few other things.

I'm trying to get somebody to pick up that work.

Cheers,
Ben.

Re: [Qemu-devel] [PATCH 10/34] linux-user: Support for restarting system calls for Microblaze targets

2016-03-03 Thread Edgar E. Iglesias

On Thu, Mar 03, 2016 at 08:15:13PM +, Peter Maydell wrote:
> Hi Edgar -- I'm just looking back at these signal handling
> race condition fix patches, and with this one I have a confusion
> about the Microblaze Linux syscall code that I hope you can
> clear up for me.
> 
> Looking at the kernel entry.S code it looks to me like
> the way syscalls work on microblaze is:
>  * syscall insn is brki r14
>  * the insn itself saves the PC of the brki into r14
>  * on entry the kernel advances r14 by 4 to skip the brki
>  * then SAVE_REGS saves r14 into the 'PC' slot in the pt_regs
>struct
>  * for syscall restart handle_restart() may wind the PC
>value in the pt_regs back by 4
>  * in any case, on syscall exit we pull the PC value out of
>pt_regs into r14, and do a return with rtbd r14, 0

Yes, that sounds right.

> 
> I think what this implies is that:
>  * r14 is a "used by the kernel, may be corrupted at any
>time, not to be touched by userspace" register

Yes. r14 is not really usable by user-space, interrupts will for example 
clobber r14 at any time aswell.

>  * on exit from a syscall PC and r14 are always the same

Yes that's how it works but as far as user-space is concerned r14 may have any 
value at any time as it's not really observable in a safe way.

>  * this includes do_sigreturn, ie "taking a signal" is one
>of the things that can corrupt r14

Yes.

> 
> Is that right?

Yes, I think so.

> (For context, the original patch is this one:
> http://patchwork.ozlabs.org/patch/514879/
> and I now suspect my review comments at the time to be wrong.)

I see. Functionally I think the patch is OK. It seems to have some whitespace 
fixes mixed with functional changes (nitpick). Either way:

Reviewed-by: Edgar E. Iglesias 

Best regards,
Edgar

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 15/16] vfio: Move iova_pgsizes from container to guest IOMMU

2016-03-03 Thread Alexey Kardashevskiy


On 03/03/2016 10:22 PM, David Gibson wrote:

On Tue, Mar 01, 2016 at 08:10:40PM +1100, Alexey Kardashevskiy wrote:

The page size is an attribute of an IOMMU, not a container as a container
may contain more just one IOMMU.

This moves iova_pgsizes from VFIOContainer to VFIOGuestIOMMU.
The following patch will use this.

This removes iova_pgsizes from Type1 IOMMU as it is not used there anyway
and when it will get guest visible IOMMU, it will use VFIOGuestIOMMU's
iova_pgsizes.

Signed-off-by: Alexey Kardashevskiy 


Hmm.  This makes an important semantic change which.. I'm not sure is
wrong, but certainly isn't adequately addressed in your commit
message.

The current iova_pgsizes is populated with information about the
*host* IOMMU, whereas you're replacing it with information about the
*guest* IOMMU.



Ah, did not realize that. Then it should be not a move but an additional 
giommu->iova_pgsizes. And this probably answers todo#1 in 16/16 about page 
masks.







---
  hw/vfio/common.c  | 16 
  include/hw/vfio/vfio-common.h |  2 +-
  2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f2a03e0..42ef1eb 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -313,9 +313,9 @@ out:
  rcu_read_unlock();
  }

-static hwaddr vfio_container_granularity(VFIOContainer *container)
+static hwaddr vfio_container_granularity(VFIOGuestIOMMU *giommu)
  {
-return (hwaddr)1 << ctz64(container->iova_pgsizes);
+return (hwaddr)1 << ctz64(giommu->iova_pgsizes);
  }

  static hwaddr vfio_iommu_page_mask(MemoryRegion *mr)
@@ -392,12 +392,13 @@ static void vfio_listener_region_add(VFIOMemoryListener 
*vlistener,
  section->offset_within_address_space;
  giommu->container = container;
  giommu->n.notify = vfio_iommu_map_notify;
+giommu->iova_pgsizes = 
section->mr->iommu_ops->get_page_sizes(section->mr);
  QLIST_INSERT_HEAD(>giommu_list, giommu, giommu_next);

  memory_region_register_iommu_notifier(giommu->iommu, >n);
  giommu->iommu->iommu_ops->vfio_notify(section->mr, true);
  memory_region_iommu_replay(giommu->iommu, >n,
-   vfio_container_granularity(container),
+   vfio_container_granularity(giommu),
 false);

  return;
@@ -743,14 +744,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
  container->min_iova = 0;
  container->max_iova = (hwaddr)-1;

-/* Assume just 4K IOVA page size */
-container->iova_pgsizes = 0x1000;
  info.argsz = sizeof(info);
  ret = ioctl(fd, VFIO_IOMMU_GET_INFO, );
-/* Ignore errors */
-if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
-container->iova_pgsizes = info.iova_pgsizes;
-}
  } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
 ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
  struct vfio_iommu_spapr_tce_info info;
@@ -811,9 +806,6 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
  }
  container->min_iova = info.dma32_window_start;
  container->max_iova = container->min_iova + info.dma32_window_size - 
1;
-
-/* Assume just 4K IOVA pages for now */
-container->iova_pgsizes = 0x1000;
  } else {
  error_report("vfio: No available IOMMU models");
  ret = -EINVAL;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index bcbc5cb..48a1d7f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -80,7 +80,6 @@ typedef struct VFIOContainer {
   * future
   */
  hwaddr min_iova, max_iova;
-uint64_t iova_pgsizes;
  QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
  QLIST_HEAD(, VFIOGroup) group_list;
  QLIST_ENTRY(VFIOContainer) next;
@@ -90,6 +89,7 @@ typedef struct VFIOGuestIOMMU {
  VFIOContainer *container;
  MemoryRegion *iommu;
  hwaddr offset_within_address_space;
+uint64_t iova_pgsizes;
  Notifier n;
  QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
  } VFIOGuestIOMMU;





--
Alexey

Re: [Qemu-devel] [Qemu-ppc] [PATCH] target-ppc: fix sync of SPR_SDR1 with KVM

2016-03-03 Thread Greg Kurz

On Thu, 3 Mar 2016 15:35:07 +1100
David Gibson  wrote:

> On Wed, Mar 02, 2016 at 11:06:19AM +1100, David Gibson wrote:
> > On Tue, Mar 01, 2016 at 07:03:10PM +0100, Greg Kurz wrote:  
> > > The gdbstub can't access guest memory with current master. This is what 
> > > you
> > > get in gdb:
> > > 
> > > 0x19b8 in main (argc= > > memory
> > > at address 0x3fffce4d3620>, argv= > > memory
> > > at address 0x3fffce4d3628>) at fp.c:11
> > > 
> > > Bisect leads to the following commit:
> > > 
> > > commit fa48b4328c39b2532e47efcfcba6d4031512f514
> > > Author: David Gibson 
> > > Date:   Tue Feb 9 09:30:21 2016 +1000
> > > 
> > > target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM
> > > 
> > > Looking at the env->external_htab users, I've spotted a behaviour change 
> > > in
> > > kvm_arch_get_registers(), which now always calls ppc_store_sdr1().
> > > 
> > > Checking kvmppc_kern_htab, like it is done in the MMU helpers, fixes the
> > > issue.
> > > 
> > > Signed-off-by: Greg Kurz   
> > 
> > Mea culpa.  Good catch, applied to ppc-for-2.6, thanks.  
> 
> Ah.. wait.. this patch breaks compile for the ppc32 target.  Can you
> fix this please.
> 

Oops... I'm on vacation this week. Not sure I can find time before next 
monday... :\

> > > ---
> > >  target-ppc/kvm.c |2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> > > index d67c169ba324..dbc37f25af2b 100644
> > > --- a/target-ppc/kvm.c
> > > +++ b/target-ppc/kvm.c
> > > @@ -1190,7 +1190,7 @@ int kvm_arch_get_registers(CPUState *cs)
> > >  return ret;
> > >  }
> > >  
> > > -if (!env->external_htab) {
> > > +if (!kvmppc_kern_htab && !env->external_htab) {
> > >  ppc_store_sdr1(env, sregs.u.s.sdr1);
> > >  }
> > >  
> > >   
> >   
> 
> 
>

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 15/16] vfio: Move iova_pgsizes from container to guest IOMMU

2016-03-03 Thread David Gibson

On Tue, Mar 01, 2016 at 08:10:40PM +1100, Alexey Kardashevskiy wrote:
> The page size is an attribute of an IOMMU, not a container as a container
> may contain more just one IOMMU.
> 
> This moves iova_pgsizes from VFIOContainer to VFIOGuestIOMMU.
> The following patch will use this.
> 
> This removes iova_pgsizes from Type1 IOMMU as it is not used there anyway
> and when it will get guest visible IOMMU, it will use VFIOGuestIOMMU's
> iova_pgsizes.
> 
> Signed-off-by: Alexey Kardashevskiy 

Hmm.  This makes an important semantic change which.. I'm not sure is
wrong, but certainly isn't adequately addressed in your commit
message.

The current iova_pgsizes is populated with information about the
*host* IOMMU, whereas you're replacing it with information about the
*guest* IOMMU.

> ---
>  hw/vfio/common.c  | 16 
>  include/hw/vfio/vfio-common.h |  2 +-
>  2 files changed, 5 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f2a03e0..42ef1eb 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -313,9 +313,9 @@ out:
>  rcu_read_unlock();
>  }
>  
> -static hwaddr vfio_container_granularity(VFIOContainer *container)
> +static hwaddr vfio_container_granularity(VFIOGuestIOMMU *giommu)
>  {
> -return (hwaddr)1 << ctz64(container->iova_pgsizes);
> +return (hwaddr)1 << ctz64(giommu->iova_pgsizes);
>  }
>  
>  static hwaddr vfio_iommu_page_mask(MemoryRegion *mr)
> @@ -392,12 +392,13 @@ static void vfio_listener_region_add(VFIOMemoryListener 
> *vlistener,
>  section->offset_within_address_space;
>  giommu->container = container;
>  giommu->n.notify = vfio_iommu_map_notify;
> +giommu->iova_pgsizes = 
> section->mr->iommu_ops->get_page_sizes(section->mr);
>  QLIST_INSERT_HEAD(>giommu_list, giommu, giommu_next);
>  
>  memory_region_register_iommu_notifier(giommu->iommu, >n);
>  giommu->iommu->iommu_ops->vfio_notify(section->mr, true);
>  memory_region_iommu_replay(giommu->iommu, >n,
> -   vfio_container_granularity(container),
> +   vfio_container_granularity(giommu),
> false);
>  
>  return;
> @@ -743,14 +744,8 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  container->min_iova = 0;
>  container->max_iova = (hwaddr)-1;
>  
> -/* Assume just 4K IOVA page size */
> -container->iova_pgsizes = 0x1000;
>  info.argsz = sizeof(info);
>  ret = ioctl(fd, VFIO_IOMMU_GET_INFO, );
> -/* Ignore errors */
> -if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
> -container->iova_pgsizes = info.iova_pgsizes;
> -}
>  } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
> ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
>  struct vfio_iommu_spapr_tce_info info;
> @@ -811,9 +806,6 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  }
>  container->min_iova = info.dma32_window_start;
>  container->max_iova = container->min_iova + info.dma32_window_size - 
> 1;
> -
> -/* Assume just 4K IOVA pages for now */
> -container->iova_pgsizes = 0x1000;
>  } else {
>  error_report("vfio: No available IOMMU models");
>  ret = -EINVAL;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index bcbc5cb..48a1d7f 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -80,7 +80,6 @@ typedef struct VFIOContainer {
>   * future
>   */
>  hwaddr min_iova, max_iova;
> -uint64_t iova_pgsizes;
>  QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>  QLIST_HEAD(, VFIOGroup) group_list;
>  QLIST_ENTRY(VFIOContainer) next;
> @@ -90,6 +89,7 @@ typedef struct VFIOGuestIOMMU {
>  VFIOContainer *container;
>  MemoryRegion *iommu;
>  hwaddr offset_within_address_space;
> +uint64_t iova_pgsizes;
>  Notifier n;
>  QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
>  } VFIOGuestIOMMU;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v7 5/9] qemu-log: new option -dfilter to limit output

2016-03-03 Thread Peter Maydell

On 22 February 2016 at 15:59, Alex Bennée  wrote:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 2f0465e..c7e0486 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3094,6 +3094,24 @@ STEXI
>  Output log in @var{logfile} instead of to stderr
>  ETEXI
>
> +DEF("dfilter", HAS_ARG, QEMU_OPTION_DFILTER, \
> +"-dfilter range,..  filter debug output to range of addresses (useful 
> for -d cpu,exec,etc..)\n",
> +QEMU_ARCH_ALL)
> +STEXI
> +@item -dfilter @var{range1}[,...]
> +@findex -dfilter
> +Filter debug output to that relevant to a range of target addresses. The 
> filter
> +spec can be either @var{start}+@var{size}, @var{start}-@var{size} or
> +@var{start}..@var{end} where @var{start} @var{end} and @var{size} are the
> +addresses and sizes required. For example:
> +@example
> +-dfilter 
> 0x8000..0x9000,0xffc8+0x200,0xffc6-0x1000
> +@end example
> +Will dump output for any code in the 0x1000 sized block starting at 0x8000 
> and

The block defined starting at 0x8000 is 0x1001 in size (unless you want to
change it to be "0x8000..0x8fff").

thanks
-- PMM

[Qemu-devel] [PATCH for-2.6] spapr_pci: fix multifunction hotplug

2016-03-03 Thread Michael Roth

Since 3f1e147, QEMU has adopted a convention of supporting function
hotplug by deferring hotplug events until func 0 is hotplugged.
This is likely how management tools like libvirt would expose
such support going forward.

Since sPAPR guests rely on per-func events rather than
slot-based, our protocol has been to hotplug func 0 *first* to
avoid cases where devices appear within guests without func 0
present to avoid undefined behavior.

To remain compatible with new convention, defer hotplug in a
similar manner, but then generate events in 0-first order as we
did in the past. Once func 0 present, fail any attempts to plug
additional functions (as we do with PCIe).

For unplug, defer unplug operations in a similar manner, but
generate unplug events such that function 0 is removed last in guest.

Signed-off-by: Michael Roth 
---
Note: I'm not super-certain this is 2.6 material/soft-freeze material,
as the current implementation does "work" if one orders device_adds
in the manner enforced by this patch. The main reason I'm tagging as
2.6 is to avoid a future compatibility issue if/when libvirt adds support
for multifunction hotplug in the manner suggested by 3f1e147. This does
however guard a bit better against user error.

 hw/ppc/spapr_pci.c | 93 ++
 1 file changed, 86 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index e8edad3..ab6dece 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1142,14 +1142,21 @@ static void 
spapr_phb_remove_pci_device(sPAPRDRConnector *drc,
 drck->detach(drc, DEVICE(pdev), spapr_phb_remove_pci_device_cb, phb, errp);
 }
 
-static sPAPRDRConnector *spapr_phb_get_pci_drc(sPAPRPHBState *phb,
-   PCIDevice *pdev)
+static sPAPRDRConnector *spapr_phb_get_pci_func_drc(sPAPRPHBState *phb,
+uint32_t busnr,
+int32_t devfn)
 {
-uint32_t busnr = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(pdev;
 return spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI,
 (phb->index << 16) |
 (busnr << 8) |
-pdev->devfn);
+devfn);
+}
+
+static sPAPRDRConnector *spapr_phb_get_pci_drc(sPAPRPHBState *phb,
+   PCIDevice *pdev)
+{
+uint32_t busnr = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(pdev;
+return spapr_phb_get_pci_func_drc(phb, busnr, pdev->devfn);
 }
 
 static uint32_t spapr_phb_get_pci_drc_index(sPAPRPHBState *phb,
@@ -1173,6 +1180,8 @@ static void spapr_phb_hot_plug_child(HotplugHandler 
*plug_handler,
 PCIDevice *pdev = PCI_DEVICE(plugged_dev);
 sPAPRDRConnector *drc = spapr_phb_get_pci_drc(phb, pdev);
 Error *local_err = NULL;
+PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev)));
+uint32_t slotnr = PCI_SLOT(pdev->devfn);
 
 /* if DR is disabled we don't need to do anything in the case of
  * hotplug or coldplug callbacks
@@ -1190,13 +1199,44 @@ static void spapr_phb_hot_plug_child(HotplugHandler 
*plug_handler,
 
 g_assert(drc);
 
+/* Following the QEMU convention used for PCIe multifunction
+ * hotplug, we do not allow functions to be hotplugged to a
+ * slot that already has function 0 present
+ */
+if (plugged_dev->hotplugged && bus->devices[PCI_DEVFN(slotnr, 0)] &&
+PCI_FUNC(pdev->devfn) != 0) {
+error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
+   " additional functions can no longer be exposed to guest.",
+   slotnr, bus->devices[PCI_DEVFN(slotnr, 0)]->name);
+return;
+}
+
 spapr_phb_add_pci_device(drc, phb, pdev, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
 }
-if (plugged_dev->hotplugged) {
-spapr_hotplug_req_add_by_index(drc);
+
+/* If this is function 0, signal hotplug for all the device functions.
+ * Otherwise defer sending the hotplug event.
+ */
+if (plugged_dev->hotplugged && PCI_FUNC(pdev->devfn) == 0) {
+int i;
+
+for (i = 0; i < 8; i++) {
+sPAPRDRConnector *func_drc;
+sPAPRDRConnectorClass *func_drck;
+sPAPRDREntitySense state;
+
+func_drc = spapr_phb_get_pci_func_drc(phb, pci_bus_num(bus),
+  PCI_DEVFN(slotnr, i));
+func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
+func_drck->entity_sense(func_drc, );
+
+if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
+spapr_hotplug_req_add_by_index(func_drc);
+}
+}
 }
 }
 
@@ -1219,12 +1259,51 @@ static void spapr_phb_hot_unplug_child(HotplugHandler

[Qemu-devel] [PATCH v8 7/7] s390x/cpu: Allow hotplug of CPUs

2016-03-03 Thread Matthew Rosato

Implement cpu hotplug routine and add the machine hook.

Signed-off-by: Matthew Rosato 
Reviewed-by: David Hildenbrand 
---
 hw/s390x/s390-virtio-ccw.c | 13 +
 target-s390x/cpu.c |  7 +++
 2 files changed, 20 insertions(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 7fc1879..174a2f8 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -186,6 +186,18 @@ static HotplugHandler 
*s390_get_hotplug_handler(MachineState *machine,
 return NULL;
 }
 
+static void s390_hot_add_cpu(const int64_t id, Error **errp)
+{
+MachineState *machine = MACHINE(qdev_get_machine());
+Error *err = NULL;
+
+s390_new_cpu(machine->cpu_model, id, );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+}
+
 static void ccw_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -194,6 +206,7 @@ static void ccw_machine_class_init(ObjectClass *oc, void 
*data)
 
 mc->init = ccw_init;
 mc->reset = s390_machine_reset;
+mc->hot_add_cpu = s390_hot_add_cpu;
 mc->block_default_type = IF_VIRTIO;
 mc->no_cdrom = 1;
 mc->no_floppy = 1;
diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index d1b7af9..4533c94 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -34,6 +34,7 @@
 #ifndef CONFIG_USER_ONLY
 #include "sysemu/arch_init.h"
 #include "sysemu/sysemu.h"
+#include "hw/s390x/sclp.h"
 #endif
 
 #define CR0_RESET   0xE0UL
@@ -240,6 +241,12 @@ static void s390_cpu_realizefn(DeviceState *dev, Error 
**errp)
 #endif
 
 scc->parent_realize(dev, errp);
+
+#if !defined(CONFIG_USER_ONLY)
+if (dev->hotplugged) {
+raise_irq_cpu_hotplug();
+}
+#endif
 }
 
 static void s390_cpu_set_id(Object *obj, Visitor *v, const char *name,
-- 
1.9.1

[Qemu-devel] [PATCH v8 6/7] s390x/cpu: Add error handling to cpu creation

2016-03-03 Thread Matthew Rosato

Check for and propogate errors during s390 cpu creation.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-virtio.c |  2 +-
 target-s390x/cpu-qom.h |  1 +
 target-s390x/cpu.c | 56 +-
 target-s390x/cpu.h |  2 ++
 target-s390x/helper.c  | 42 +++--
 5 files changed, 99 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-virtio.c b/hw/s390x/s390-virtio.c
index f00d6b4..2ab7b94 100644
--- a/hw/s390x/s390-virtio.c
+++ b/hw/s390x/s390-virtio.c
@@ -116,7 +116,7 @@ void s390_init_cpus(MachineState *machine)
 }
 
 for (i = 0; i < smp_cpus; i++) {
-cpu_s390x_init(machine->cpu_model);
+s390_new_cpu(machine->cpu_model, i, _fatal);
 }
 }
 
diff --git a/target-s390x/cpu-qom.h b/target-s390x/cpu-qom.h
index 56d82f2..1c90933 100644
--- a/target-s390x/cpu-qom.h
+++ b/target-s390x/cpu-qom.h
@@ -68,6 +68,7 @@ typedef struct S390CPU {
 /*< public >*/
 
 CPUS390XState env;
+int64_t id;
 /* needed for live migration */
 void *irqstate;
 uint32_t irqstate_saved_size;
diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index 76c8eaf..d1b7af9 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -30,8 +30,10 @@
 #include "qemu/error-report.h"
 #include "hw/hw.h"
 #include "trace.h"
+#include "qapi/visitor.h"
 #ifndef CONFIG_USER_ONLY
 #include "sysemu/arch_init.h"
+#include "sysemu/sysemu.h"
 #endif
 
 #define CR0_RESET   0xE0UL
@@ -199,16 +201,36 @@ static void s390_cpu_realizefn(DeviceState *dev, Error 
**errp)
 CPUS390XState *env = >env;
 Error *err = NULL;
 
+#if !defined(CONFIG_USER_ONLY)
+if (cpu->id >= max_cpus) {
+error_setg(errp, "Unable to add CPU: %" PRIi64
+   ", max allowed: %d", cpu->id, max_cpus - 1);
+return;
+}
+#endif
+if (cpu_exists(cpu->id)) {
+error_setg(errp, "Unable to add CPU: %" PRIi64
+   ", it already exists", cpu->id);
+return;
+}
+if (cpu->id != scc->next_cpu_id) {
+error_setg(errp, "Unable to add CPU: %" PRIi64
+   ", The next available id is %" PRIi64, cpu->id,
+   scc->next_cpu_id);
+return;
+}
+
 cpu_exec_init(cs, );
 if (err != NULL) {
 error_propagate(errp, err);
 return;
 }
+scc->next_cpu_id = cs->cpu_index + 1;
 
 #if !defined(CONFIG_USER_ONLY)
 qemu_register_reset(s390_cpu_machine_reset_cb, cpu);
 #endif
-env->cpu_num = scc->next_cpu_id++;
+env->cpu_num = cpu->id;
 s390_cpu_gdb_init(cs);
 qemu_init_vcpu(cs);
 #if !defined(CONFIG_USER_ONLY)
@@ -220,6 +242,36 @@ static void s390_cpu_realizefn(DeviceState *dev, Error 
**errp)
 scc->parent_realize(dev, errp);
 }
 
+static void s390_cpu_set_id(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+S390CPU *cpu = S390_CPU(obj);
+DeviceState *dev = DEVICE(obj);
+const int64_t min = 0;
+const int64_t max = UINT32_MAX;
+Error *err = NULL;
+int64_t value;
+
+if (dev->realized) {
+error_setg(errp, "Attempt to set property '%s' on '%s' after "
+   "it was realized", name, object_get_typename(obj));
+return;
+}
+
+visit_type_int(v, name, , );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+if (value < min || value > max) {
+error_setg(errp, "Property %s.%s doesn't take value %" PRId64
+   " (minimum: %" PRId64 ", maximum: %" PRId64 ")" ,
+   object_get_typename(obj), name, value, min, max);
+return;
+}
+cpu->id = value;
+}
+
 static void s390_cpu_initfn(Object *obj)
 {
 CPUState *cs = CPU(obj);
@@ -233,6 +285,8 @@ static void s390_cpu_initfn(Object *obj)
 cs->env_ptr = env;
 cs->halted = 1;
 cs->exception_index = EXCP_HLT;
+object_property_add(OBJECT(cpu), "id", "int64_t", NULL, s390_cpu_set_id,
+NULL, NULL, NULL);
 #if !defined(CONFIG_USER_ONLY)
 qemu_get_timedate(, 0);
 env->tod_offset = TOD_UNIX_EPOCH +
diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index 49c8415..6667cc0 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -413,6 +413,8 @@ void trigger_pgm_exception(CPUS390XState *env, uint32_t 
code, uint32_t ilen);
 #endif
 
 S390CPU *cpu_s390x_init(const char *cpu_model);
+S390CPU *s390_new_cpu(const char *cpu_model, int64_t id, Error **errp);
+S390CPU *cpu_s390x_create(const char *cpu_model, Error **errp);
 void s390x_translate_init(void);
 int cpu_s390x_exec(CPUState *cpu);
 
diff --git a/target-s390x/helper.c b/target-s390x/helper.c
index 838bdd9..c48c816 100644
--- a/target-s390x/helper.c
+++ b/target-s390x/helper.c
@@ -30,6 +30,9 @@
 //#define DEBUG_S390
 //#define DEBUG_S390_STDOUT
 
+/* Use to track cpu ID for linux-user only */
+static int64_t next_cpu_id;
+
 #ifdef DEBUG_S390
 #ifdef

[Qemu-devel] [PATCH v8 4/7] s390x/cpu: Tolerate max_cpus

2016-03-03 Thread Matthew Rosato

Once hotplug is enabled, interrupts may come in for CPUs
with an address > smp_cpus.  Allocate for this and allow
search routines to look beyond smp_cpus.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-virtio.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-virtio.c b/hw/s390x/s390-virtio.c
index c501a48..90bc58a 100644
--- a/hw/s390x/s390-virtio.c
+++ b/hw/s390x/s390-virtio.c
@@ -58,15 +58,16 @@
 #define S390_TOD_CLOCK_VALUE_MISSING0x00
 #define S390_TOD_CLOCK_VALUE_PRESENT0x01
 
-static S390CPU **ipi_states;
+static S390CPU **cpu_states;
 
 S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
 {
-if (cpu_addr >= smp_cpus) {
+if (cpu_addr >= max_cpus) {
 return NULL;
 }
 
-return ipi_states[cpu_addr];
+/* Fast lookup via CPU ID */
+return cpu_states[cpu_addr];
 }
 
 void s390_init_ipl_dev(const char *kernel_filename,
@@ -101,14 +102,14 @@ void s390_init_cpus(MachineState *machine)
 machine->cpu_model = "host";
 }
 
-ipi_states = g_malloc(sizeof(S390CPU *) * smp_cpus);
+cpu_states = g_malloc0(sizeof(S390CPU *) * max_cpus);
 
-for (i = 0; i < smp_cpus; i++) {
+for (i = 0; i < max_cpus; i++) {
 S390CPU *cpu;
 
 cpu = cpu_s390x_init(machine->cpu_model);
 
-ipi_states[i] = cpu;
+cpu_states[i] = cpu;
 }
 }
 
-- 
1.9.1

[Qemu-devel] [PATCH v8 2/7] s390x/cpu: Set initial CPU state in common routine

2016-03-03 Thread Matthew Rosato

Both initial and hotplugged CPUs need to set the same initial
state.

Signed-off-by: Matthew Rosato 
Reviewed-by: David Hildenbrand 
---
 hw/s390x/s390-virtio.c | 4 
 target-s390x/cpu.c | 2 ++
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-virtio.c b/hw/s390x/s390-virtio.c
index d40d0dc..c501a48 100644
--- a/hw/s390x/s390-virtio.c
+++ b/hw/s390x/s390-virtio.c
@@ -105,14 +105,10 @@ void s390_init_cpus(MachineState *machine)
 
 for (i = 0; i < smp_cpus; i++) {
 S390CPU *cpu;
-CPUState *cs;
 
 cpu = cpu_s390x_init(machine->cpu_model);
-cs = CPU(cpu);
 
 ipi_states[i] = cpu;
-cs->halted = 1;
-cs->exception_index = EXCP_HLT;
 }
 }
 
diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index 73a910d..603c2a1 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -219,6 +219,8 @@ static void s390_cpu_initfn(Object *obj)
 #endif
 
 cs->env_ptr = env;
+cs->halted = 1;
+cs->exception_index = EXCP_HLT;
 cpu_exec_init(cs, _abort);
 #if !defined(CONFIG_USER_ONLY)
 qemu_register_reset(s390_cpu_machine_reset_cb, cpu);
-- 
1.9.1

[Qemu-devel] [PATCH v8 0/7] Allow hotplug of s390 CPUs

2016-03-03 Thread Matthew Rosato

Changes from v7->v8:

* Patch 3: Rather than using cpu_index to set cpu_num temporarily, squash 
  in pieces from other patches -- specifically next_cpu_id and move of 
  cpu_exec_init to realizefn (David)
* Patch 4: New patch, splits out toleration of max_cpus (Igor)
* Patch 5: 
 * use hotplug_dev instead of qdev_get_machine (Igor)
 * Add missing g_free(name) (Igor)
* Patch 6:
 * Change s390_new_cpu to take cpu_model parm.  Move to helper.c, call 
   s390_new_cpu from cpu_s390x_init.  (David)
 * s/local_err/err/ (David)
 * Drop getter routine for id property, move some sanity checking from 
   setter to realize (David)
 * Drop unnecessary unref (David)

**

As discussed in the KVM call, we will go ahead with cpu_add for 
s390x to get cpu hotplug functionality in s390x now, until
architectures that require a more robust hotplug interface
settle on a design.

To configure a guest with 2 CPUs online at 
boot and 4 maximum:

qemu -smp 2,maxcpus=4

Or, when using libvirt:
  
...
4
...
   


To subsequently hotplug a CPU:

Issue 'cpu-add ' from qemu monitor, or use virsh setvcpus --count  
, where  is the total number of desired guest CPUs.

At this point, the guest must bring the CPU online for use -- This can be 
achieved via "echo 1 > /sys/devices/system/cpu/cpuX/online" or via a management 
tool like cpuplugd.

This patch set is based on work previously done by Jason Herne.

Matthew Rosato (7):
  s390x/cpu: Cleanup init in preparation for hotplug
  s390x/cpu: Set initial CPU state in common routine
  s390x/cpu: Get rid of side effects when creating a vcpu
  s390x/cpu: Tolerate max_cpus
  s390x/cpu: Add CPU property links
  s390x/cpu: Add error handling to cpu creation
  s390x/cpu: Allow hotplug of CPUs

 hw/s390x/s390-virtio-ccw.c | 49 ++-
 hw/s390x/s390-virtio.c | 36 +++-
 hw/s390x/s390-virtio.h |  2 +-
 target-s390x/cpu-qom.h |  3 ++
 target-s390x/cpu.c | 83 +++---
 target-s390x/cpu.h |  2 ++
 target-s390x/helper.c  | 42 +--
 7 files changed, 192 insertions(+), 25 deletions(-)

-- 
1.9.1

[Qemu-devel] [PATCH v8 3/7] s390x/cpu: Get rid of side effects when creating a vcpu

2016-03-03 Thread Matthew Rosato

In preparation for hotplug, defer some CPU initialization
until the device is actually being realized, including
cpu_exec_init.

Signed-off-by: Matthew Rosato 
---
 target-s390x/cpu-qom.h |  2 ++
 target-s390x/cpu.c | 20 +++-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/target-s390x/cpu-qom.h b/target-s390x/cpu-qom.h
index 029a44a..56d82f2 100644
--- a/target-s390x/cpu-qom.h
+++ b/target-s390x/cpu-qom.h
@@ -47,6 +47,8 @@ typedef struct S390CPUClass {
 CPUClass parent_class;
 /*< public >*/
 
+int64_t next_cpu_id;
+
 DeviceRealize parent_realize;
 void (*parent_reset)(CPUState *cpu);
 void (*load_normal)(CPUState *cpu);
diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index 603c2a1..76c8eaf 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -195,7 +195,20 @@ static void s390_cpu_realizefn(DeviceState *dev, Error 
**errp)
 {
 CPUState *cs = CPU(dev);
 S390CPUClass *scc = S390_CPU_GET_CLASS(dev);
+S390CPU *cpu = S390_CPU(dev);
+CPUS390XState *env = >env;
+Error *err = NULL;
+
+cpu_exec_init(cs, );
+if (err != NULL) {
+error_propagate(errp, err);
+return;
+}
 
+#if !defined(CONFIG_USER_ONLY)
+qemu_register_reset(s390_cpu_machine_reset_cb, cpu);
+#endif
+env->cpu_num = scc->next_cpu_id++;
 s390_cpu_gdb_init(cs);
 qemu_init_vcpu(cs);
 #if !defined(CONFIG_USER_ONLY)
@@ -213,7 +226,6 @@ static void s390_cpu_initfn(Object *obj)
 S390CPU *cpu = S390_CPU(obj);
 CPUS390XState *env = >env;
 static bool inited;
-static int cpu_num = 0;
 #if !defined(CONFIG_USER_ONLY)
 struct tm tm;
 #endif
@@ -221,9 +233,7 @@ static void s390_cpu_initfn(Object *obj)
 cs->env_ptr = env;
 cs->halted = 1;
 cs->exception_index = EXCP_HLT;
-cpu_exec_init(cs, _abort);
 #if !defined(CONFIG_USER_ONLY)
-qemu_register_reset(s390_cpu_machine_reset_cb, cpu);
 qemu_get_timedate(, 0);
 env->tod_offset = TOD_UNIX_EPOCH +
   (time2tod(mktimegm()) * 10ULL);
@@ -232,7 +242,6 @@ static void s390_cpu_initfn(Object *obj)
 env->cpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, s390x_cpu_timer, cpu);
 s390_cpu_set_state(CPU_STATE_STOPPED, cpu);
 #endif
-env->cpu_num = cpu_num++;
 
 if (tcg_enabled() && !inited) {
 inited = true;
@@ -339,6 +348,7 @@ static void s390_cpu_class_init(ObjectClass *oc, void *data)
 CPUClass *cc = CPU_CLASS(scc);
 DeviceClass *dc = DEVICE_CLASS(oc);
 
+scc->next_cpu_id = 0;
 scc->parent_realize = dc->realize;
 dc->realize = s390_cpu_realizefn;
 
@@ -371,7 +381,7 @@ static void s390_cpu_class_init(ObjectClass *oc, void *data)
 cc->gdb_arch_name = s390_gdb_arch_name;
 
 /*
- * Reason: s390_cpu_initfn() calls cpu_exec_init(), which saves
+ * Reason: s390_cpu_realizefn() calls cpu_exec_init(), which saves
  * the object in cpus -> dangling pointer after final
  * object_unref().
  */
-- 
1.9.1

[Qemu-devel] [PATCH v8 5/7] s390x/cpu: Add CPU property links

2016-03-03 Thread Matthew Rosato

Link each CPUState as property machine/cpu[n] during initialization.
Add a hotplug handler to s390-virtio-ccw machine and set the
state during plug.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-virtio-ccw.c | 34 ++
 hw/s390x/s390-virtio.c | 15 +++
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index b05ed8b..7fc1879 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -156,10 +156,41 @@ static void ccw_init(MachineState *machine)
 gtod_save, gtod_load, kvm_state);
 }
 
+static void s390_cpu_plug(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
+{
+gchar *name;
+S390CPU *cpu = S390_CPU(dev);
+CPUState *cs = CPU(dev);
+
+name = g_strdup_printf("cpu[%i]", cpu->env.cpu_num);
+object_property_set_link(OBJECT(hotplug_dev), OBJECT(cs), name,
+ errp);
+g_free(name);
+}
+
+static void s390_machine_device_plug(HotplugHandler *hotplug_dev,
+ DeviceState *dev, Error **errp)
+{
+if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+s390_cpu_plug(hotplug_dev, dev, errp);
+}
+}
+
+static HotplugHandler *s390_get_hotplug_handler(MachineState *machine,
+DeviceState *dev)
+{
+if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+return HOTPLUG_HANDLER(machine);
+}
+return NULL;
+}
+
 static void ccw_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
 NMIClass *nc = NMI_CLASS(oc);
+HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
 
 mc->init = ccw_init;
 mc->reset = s390_machine_reset;
@@ -171,6 +202,8 @@ static void ccw_machine_class_init(ObjectClass *oc, void 
*data)
 mc->no_sdcard = 1;
 mc->use_sclp = 1;
 mc->max_cpus = 255;
+mc->get_hotplug_handler = s390_get_hotplug_handler;
+hc->plug = s390_machine_device_plug;
 nc->nmi_monitor_handler = s390_nmi;
 }
 
@@ -232,6 +265,7 @@ static const TypeInfo ccw_machine_info = {
 .class_init= ccw_machine_class_init,
 .interfaces = (InterfaceInfo[]) {
 { TYPE_NMI },
+{ TYPE_HOTPLUG_HANDLER},
 { }
 },
 };
diff --git a/hw/s390x/s390-virtio.c b/hw/s390x/s390-virtio.c
index 90bc58a..f00d6b4 100644
--- a/hw/s390x/s390-virtio.c
+++ b/hw/s390x/s390-virtio.c
@@ -97,6 +97,7 @@ void s390_init_ipl_dev(const char *kernel_filename,
 void s390_init_cpus(MachineState *machine)
 {
 int i;
+gchar *name;
 
 if (machine->cpu_model == NULL) {
 machine->cpu_model = "host";
@@ -105,11 +106,17 @@ void s390_init_cpus(MachineState *machine)
 cpu_states = g_malloc0(sizeof(S390CPU *) * max_cpus);
 
 for (i = 0; i < max_cpus; i++) {
-S390CPU *cpu;
-
-cpu = cpu_s390x_init(machine->cpu_model);
+name = g_strdup_printf("cpu[%i]", i);
+object_property_add_link(OBJECT(machine), name, TYPE_S390_CPU,
+ (Object **) _states[i],
+ object_property_allow_set_link,
+ OBJ_PROP_LINK_UNREF_ON_RELEASE,
+ _abort);
+g_free(name);
+}
 
-cpu_states[i] = cpu;
+for (i = 0; i < smp_cpus; i++) {
+cpu_s390x_init(machine->cpu_model);
 }
 }
 
-- 
1.9.1

[Qemu-devel] [PATCH v8 1/7] s390x/cpu: Cleanup init in preparation for hotplug

2016-03-03 Thread Matthew Rosato

Ensure a valid cpu_model is set upfront by setting the
default value directly into the MachineState when none is
specified.  This is needed to ensure hotplugged CPUs share
the same cpu_model.

Signed-off-by: Matthew Rosato 
Reviewed-by: David Hildenbrand 
---
 hw/s390x/s390-virtio-ccw.c | 2 +-
 hw/s390x/s390-virtio.c | 8 
 hw/s390x/s390-virtio.h | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 89f5d0d..b05ed8b 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -136,7 +136,7 @@ static void ccw_init(MachineState *machine)
 virtio_ccw_register_hcalls();
 
 /* init CPUs */
-s390_init_cpus(machine->cpu_model);
+s390_init_cpus(machine);
 
 if (kvm_enabled()) {
 kvm_s390_enable_css_support(s390_cpu_addr2state(0));
diff --git a/hw/s390x/s390-virtio.c b/hw/s390x/s390-virtio.c
index 8e533ae..d40d0dc 100644
--- a/hw/s390x/s390-virtio.c
+++ b/hw/s390x/s390-virtio.c
@@ -93,12 +93,12 @@ void s390_init_ipl_dev(const char *kernel_filename,
 qdev_init_nofail(dev);
 }
 
-void s390_init_cpus(const char *cpu_model)
+void s390_init_cpus(MachineState *machine)
 {
 int i;
 
-if (cpu_model == NULL) {
-cpu_model = "host";
+if (machine->cpu_model == NULL) {
+machine->cpu_model = "host";
 }
 
 ipi_states = g_malloc(sizeof(S390CPU *) * smp_cpus);
@@ -107,7 +107,7 @@ void s390_init_cpus(const char *cpu_model)
 S390CPU *cpu;
 CPUState *cs;
 
-cpu = cpu_s390x_init(cpu_model);
+cpu = cpu_s390x_init(machine->cpu_model);
 cs = CPU(cpu);
 
 ipi_states[i] = cpu;
diff --git a/hw/s390x/s390-virtio.h b/hw/s390x/s390-virtio.h
index eebce8e..ffd014c 100644
--- a/hw/s390x/s390-virtio.h
+++ b/hw/s390x/s390-virtio.h
@@ -19,7 +19,7 @@
 typedef int (*s390_virtio_fn)(const uint64_t *args);
 void s390_register_virtio_hypercall(uint64_t code, s390_virtio_fn fn);
 
-void s390_init_cpus(const char *cpu_model);
+void s390_init_cpus(MachineState *machine);
 void s390_init_ipl_dev(const char *kernel_filename,
const char *kernel_cmdline,
const char *initrd_filename,
-- 
1.9.1

Re: [Qemu-devel] [PATCH v4 00/16] data-driven device registers

2016-03-03 Thread Alistair Francis

On Mon, Feb 29, 2016 at 4:26 AM, Alex Bennée  wrote:
>
> Alistair Francis  writes:
>
>> This patch series is based on Peter C's original register API. His
>> original cover letter is below.
>>
>> I have added a new function memory_region_add_subregion_no_print() which
>> stops memory regions from being printed by 'info mtree'. This is used to
>> avoid evey register being printed when running 'info mtree'.
>
> OK I've finished my pass of v4. In general I think it is looking OK. I
> think the main things that remain to be addressed are:
>
>   - not breaking up MemoryRegions for each individual register
>   - adding some access MACROs to aid reading/grepping of macro defined
> registers
>   - some patches have un-related changes in them
>
> Let me know when v5 is ready ;-)

Thanks for your review, I'll go through and address your comments.

Thanks,

Alistair

>
>>
>> NOTE: That info qom-tree will still print all of these registers.
>>
>> Future work: Allow support for memory attributes.
>>
>> V4:
>>  - Rebase and fix build issue
>>  - Simplify the register write logic
>>  - Other small fixes suggested by Alex Bennee
>> V3:
>>  - Small changes reported by Fred
>> V2:
>>  - Rebase
>>  - Fix up IOU SLCR connections
>>  - Add the memory_region_add_subregion_no_print() function and use it
>>for the registers
>> Changes since RFC:
>>  - Connect the ZynqMP IOU SLCR device
>>  - Rebase
>>
>> Original cover letter From Peter:
>> Hi All. This is a new scheme I've come up with handling device registers in a
>> data driven way. My motivation for this is to factor out a lot of the access
>> checking that seems to be replicated in every device. See P1 commit message 
>> for
>> further discussion.
>>
>> P1 is the main patch, adds the register definition functionality
>> P2-3,6 add helpers that glue the register API to the Memory API
>> P4 Defines a set of macros that minimise register and field definitions
>> P5 is QOMfication
>> P7 is a trivial
>> P10-13 Work up to GPIO support
>> P8,9,14 add new devices (the Xilinx Zynq devcfg & ZynqMP SLCR) that use this
>> scheme.
>> P15: Connect the ZynqMP SLCR device
>>
>> This Zynq devcfg device was particularly finnicky with per-bit restrictions.
>> I'm also looking for a higher-than-usual modelling fidelity
>> on the register space, with semantics defined for random reserved bits
>> in-between otherwise consistent fields.
>>
>> Here's an example of the qemu_log output for the devcfg device. This is 
>> produced
>> by now generic sharable code:
>>
>> /machine/unattached/device[44]:Addr 0x08:CFG: write of value 0508
>> /machine/unattached/device[44]:Addr 0x80:MCTRL: write of value 00800010
>> /machine/unattached/device[44]:Addr 0x10:INT_MASK: write of value 
>> 
>> /machine/unattached/device[44]:Addr :CTRL: write of value 0c00607f
>>
>> And an example of a rogue guest banging on a bad bit:
>>
>> /machine/unattached/device[44]:Addr 0x14:STATUS bits 0x01 may not be 
>> \
>>   written to 1
>>
>> A future feature I am interested in is implementing TCG optimisation of
>> side-effectless registers. The register API allows clear definition of
>> what registers have txn side effects and which ones don't. You could even
>> go a step further and translate such side-effectless accesses based on the
>> data pointer for the register.
>>
>>
>> Alistair Francis (3):
>>   memory: Allow subregions to not be printed by info mtree
>>   register: Add Register API
>>   xlnx-zynqmp: Connect the ZynqMP IOU SLCR
>>
>> Peter Crosthwaite (13):
>>   register: Add Memory API glue
>>   register: Add support for decoding information
>>   register: Define REG and FIELD macros
>>   register: QOMify
>>   register: Add block initialise helper
>>   bitops: Add ONES macro
>>   dma: Add Xilinx Zynq devcfg device model
>>   xilinx_zynq: add devcfg to machine model
>>   qdev: Define qdev_get_gpio_out
>>   qdev: Add qdev_pass_all_gpios API
>>   irq: Add opaque setter routine
>>   register: Add GPIO API
>>   misc: Introduce ZynqMP IOU SLCR
>>
>>  default-configs/arm-softmmu.mak|   1 +
>>  hw/arm/xilinx_zynq.c   |   8 +
>>  hw/arm/xlnx-zynqmp.c   |  13 ++
>>  hw/core/Makefile.objs  |   1 +
>>  hw/core/irq.c  |   5 +
>>  hw/core/qdev.c |  21 ++
>>  hw/core/register.c | 348 +
>>  hw/dma/Makefile.objs   |   1 +
>>  hw/dma/xlnx-zynq-devcfg.c  | 393 
>> +
>>  hw/misc/Makefile.objs  |   1 +
>>  hw/misc/xlnx-zynqmp-iou-slcr.c | 113 ++
>>  include/exec/memory.h  |  17 ++
>>  include/hw/arm/xlnx-zynqmp.h   |   2 +
>>  include/hw/dma/xlnx-zynq-devcfg.h  |  62 ++
>>  include/hw/irq.h

Re: [Qemu-devel] [PATCH v7 5/9] qemu-log: new option -dfilter to limit output

2016-03-03 Thread Richard Henderson


On 03/03/2016 06:04 AM, Alex Bennée wrote:


Richard Henderson  writes:


On 02/22/2016 07:59 AM, Alex Bennée wrote:

+qemu_set_dfilter_ranges("0x1000+0x100");
+
+g_assert_false(qemu_log_in_addr_range(0xfff));
+g_assert(qemu_log_in_addr_range(0x1000));
+g_assert(qemu_log_in_addr_range(0x1100));


This is exactly what I was talking about wrt off-by-one error in my first
review -- 0x100 bytes, including 0x1000, finishes at 0x10ff.

This third test should fail.


OK so should 0x100+0x0 fail as it makes no sense, 0 bytes from 0x100 start?


Yes, I would think so.


r~

Re: [Qemu-devel] [PATCH 10/34] linux-user: Support for restarting system calls for Microblaze targets

2016-03-03 Thread Peter Maydell

Hi Edgar -- I'm just looking back at these signal handling
race condition fix patches, and with this one I have a confusion
about the Microblaze Linux syscall code that I hope you can
clear up for me.

Looking at the kernel entry.S code it looks to me like
the way syscalls work on microblaze is:
 * syscall insn is brki r14
 * the insn itself saves the PC of the brki into r14
 * on entry the kernel advances r14 by 4 to skip the brki
 * then SAVE_REGS saves r14 into the 'PC' slot in the pt_regs
   struct
 * for syscall restart handle_restart() may wind the PC
   value in the pt_regs back by 4
 * in any case, on syscall exit we pull the PC value out of
   pt_regs into r14, and do a return with rtbd r14, 0

I think what this implies is that:
 * r14 is a "used by the kernel, may be corrupted at any
   time, not to be touched by userspace" register
 * on exit from a syscall PC and r14 are always the same
 * this includes do_sigreturn, ie "taking a signal" is one
   of the things that can corrupt r14

Is that right?

(For context, the original patch is this one:
http://patchwork.ozlabs.org/patch/514879/
and I now suspect my review comments at the time to be wrong.)

thanks
-- PMM

Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-03-03 Thread Dr. David Alan Gilbert

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> On 2016/3/1 20:25, Dr. David Alan Gilbert wrote:
> >* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> >>On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
> >>>* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
> >>* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
> >>>From: root 
> >>>
> >>>This is the 15th version of COLO (Still only support periodic 
> >>>checkpoint).
> >>>
> >>>Here is only COLO frame part, you can get the whole codes from github:
> >>>https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> >>>
> >>>There are little changes for this series except the network releated 
> >>>part.
> >>
> >>I was looking at the time the guest is paused during COLO and
> >>was surprised to find one of the larger chunks was the time to reset
> >>the guest before loading each checkpoint;  I've traced it part way, the
> >>biggest contributors for my test VM seem to be:
> >>
> >>   3.8ms  pcibus_reset: VGA
> >>   1.8ms  pcibus_reset: virtio-net-pci
> >>   1.5ms  pcibus_reset: virtio-blk-pci
> >>   1.5ms  qemu_devices_reset: piix4_reset
> >>   1.1ms  pcibus_reset: piix3-ide
> >>   1.1ms  pcibus_reset: virtio-rng-pci
> >>
> >>I've not looked deeper yet, but some of these are very silly;
> >>I'm running with -nographic so why it's taking 3.8ms to reset VGA is
> >>going to be interesting.
> >>Also, my only block device is the virtio-blk, so while I understand the
> >>standard PC machine has the IDE controller, why it takes it over a ms
> >>to reset an unused device.
> >
> >OK, so I've dug a bit deeper, and it appears that it's the changes in
> >PCI bars that actually take the time;  every time we do a reset we
> >reset all the BARs, this causes it to do a pci_update_mappings and
> >end up doing a memory_region_del_subregion.
> >Then we load the config space of the PCI device as we do the 
> >vmstate_load,
> >and this recreates all the mappings again.
> >
> >I'm not sure what the fix is, but that sounds like it would
> >speed up the checkpoints usefully if we can avoid the map/remap when
> >they're the same.
> >
> 
> Interesting, and thanks for your report.
> 
> We already known qemu_system_reset() is a time-consuming function, we 
> shouldn't
> call it here, but if we didn't do that, there will be a bug, which we have
> reported before in the previous COLO series, the bellow is the copy of 
> the related
> patch comment:
> >
> >Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
> >(in particular if the guest changed a bar and the device code tried to 
> >access the memory
> >while loading the state???) - but it does seem to work and shaves ~10ms off 
> >the reset/load
> >times:
> >
> 
> Nice work, i also tested it, and it is a good improvement, I'm wondering if 
> it is safe here,
> it should be safe to apply to qemu_system_reset() independently (I tested it 
> too,
> it will shaves about 5ms off).

Yes, it seems quite nice.
I did find today one VM that wont boot with COLO with that change; it's
an ubuntu VM that has a delay in Grub, and it's when it does the first
checkpoint during Grub still being displayed it gets an error from
the inbound migrate.

The error is VQ 0 size 0x80 Guest index 0x2444 inconsistent with Host index 
0x119e: delta 0x12a6
from virtio-blk - so maybe virtio-blk is accessing the memory during loading.

Dave

> Hailiang
> 
> >Dave
> >
> >commit 7570b2984143860005ad9fe79f5394c75f294328
> >Author: Dr. David Alan Gilbert 
> >Date:   Tue Mar 1 12:08:14 2016 +
> >
> > COLO: Lock memory map around reset/load
> >
> > Changing the memory map appears to be expensive; we see this
> > partiuclarly when on loading a checkpoint we:
> >a) reset the devices
> >   This causes PCI bars to be reset
> >b) Loading the device states
> >   This causes the PCI bars to be reloaded.
> >
> > Turning this all into a single memory_region_transaction saves
> >  ~10ms/checkpoint.
> >
> > TBD: What happens if the device code accesses the RAM during loading
> > the checkpoint?
> >
> > Signed-off-by: Dr. David Alan Gilbert 
> > Suggested-by: Paolo Bonzini 
> >
> >diff --git a/migration/colo.c b/migration/colo.c
> >index 45c3432..c44fb2a 100644
> >--- a/migration/colo.c
> >+++ b/migration/colo.c
> >@@ -22,6 +22,7 @@
> >  #include "net/colo-proxy.h"
> >  #include "net/net.h"
> >  #include "block/block_int.h"
> >+#include "exec/memory.h"
> >
> >  static bool vmstate_loading;
> >
> >@@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void

[Qemu-devel] [PATCH v4 2/3] generic-loader: Add a generic loader

2016-03-03 Thread Alistair Francis

Add a generic loader to QEMU which can be used to load images or set
memory values.

Signed-off-by: Alistair Francis 
---
V4:
 - Allow the loader to work with every architecture
 - Move the file to hw/core
 - Increase the maximum number of CPUs
 - Make the CPU operations conditional
 - Convert the cpu option to cpu-num
 - Require the user to specify endianess
V3:
 - Pass the ram_size to load_image_targphys()
V2:
 - Add maintainers entry
 - Perform bounds checking
 - Register and unregister the reset in the realise/unrealise
Changes since RFC:
 - Add BE support

 MAINTAINERS  |   6 ++
 hw/core/Makefile.objs|   2 +
 hw/core/generic-loader.c | 144 +++
 include/hw/core/generic-loader.h |  45 
 4 files changed, 197 insertions(+)
 create mode 100644 hw/core/generic-loader.c
 create mode 100644 include/hw/core/generic-loader.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a5853cd..337cc1b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -971,6 +971,12 @@ F: hw/acpi/nvdimm.c
 F: hw/mem/nvdimm.c
 F: include/hw/mem/nvdimm.h
 
+Generic Loader
+M: Alistair Francis 
+S: Maintained
+F: hw/core/generic-loader.c
+F: include/hw/core/generic-loader.h
+
 Subsystems
 --
 Audio
diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index abb3560..b5d5197 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -15,3 +15,5 @@ common-obj-$(CONFIG_SOFTMMU) += null-machine.o
 common-obj-$(CONFIG_SOFTMMU) += loader.o
 common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
+
+obj-$(CONFIG_SOFTMMU) += generic-loader.o
diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
new file mode 100644
index 000..1b21934
--- /dev/null
+++ b/hw/core/generic-loader.c
@@ -0,0 +1,144 @@
+/*
+ * Generic Loader
+ *
+ * Copyright (C) 2014 Li Guang
+ * Copyright (C) 2016 Xilinx Inc.
+ * Written by Li Guang 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "sysemu/dma.h"
+#include "hw/loader.h"
+#include "hw/core/generic-loader.h"
+
+#define CPU_NONE 0x
+
+static void generic_loader_reset(void *opaque)
+{
+GenericLoaderState *s = GENERIC_LOADER(opaque);
+
+if (s->cpu) {
+CPUClass *cc = CPU_GET_CLASS(s->cpu);
+cpu_reset(s->cpu);
+if (cc) {
+cc->set_pc(s->cpu, s->addr);
+}
+}
+
+if (s->data_len) {
+assert(s->data_len < sizeof(s->data));
+dma_memory_write((s->cpu ? s->cpu : first_cpu)->as, s->addr, >data,
+ s->data_len);
+}
+}
+
+static void generic_loader_realize(DeviceState *dev, Error **errp)
+{
+GenericLoaderState *s = GENERIC_LOADER(dev);
+hwaddr entry;
+int big_endian;
+int size = 0;
+
+qemu_register_reset(generic_loader_reset, dev);
+
+if (s->cpu_num != CPU_NONE) {
+s->cpu = qemu_get_cpu(s->cpu_num);
+if (!s->cpu) {
+error_setg(errp, "Specified boot CPU#%d is nonexistent",
+   s->cpu_num);
+return;
+}
+}
+
+#ifdef TARGET_WORDS_BIGENDIAN
+big_endian = 1;
+#else
+big_endian = 0;
+#endif
+
+if (s->file) {
+if (!s->force_raw) {
+size = load_elf(s->file, NULL, NULL, , NULL, NULL,
+big_endian, 0, 0);
+
+if (size < 0) {
+size = load_uimage(s->file, , NULL, NULL, NULL, NULL);
+}
+}
+
+if (size < 0) {
+/* Default to the maximum size being the machine's ram size */
+size = load_image_targphys(s->file, s->addr, ram_size);
+} else {
+s->addr = entry;
+}
+
+if (size < 0) {
+error_setg(errp, "Cannot load specified image %s", s->file);
+return;
+}
+}
+
+if (s->data_len && (s->data_len > sizeof(s->data))) {
+error_setg(errp, "data-len cannot be more then the data size");
+return;
+}
+
+/* Convert the data endiannes */
+if (s->data_be) {
+s->data = cpu_to_be64(s->data);
+} else {
+s->data = cpu_to_le64(s->data);
+}
+}
+
+static void generic_loader_unrealize(DeviceState *dev, Error **errp)
+{
+qemu_unregister_reset(generic_loader_reset, dev);
+}
+
+static Property generic_loader_props[] = {
+DEFINE_PROP_UINT64("addr",

[Qemu-devel] [PATCH v4 0/3] Add a generic loader

2016-03-03 Thread Alistair Francis

This work is based on the original work by Li Guang with extra
features added by Peter C and myself.

The idea of this loader is to allow the user to load multiple images
or values into QEMU at startup.

Memory values can be loaded like this: -device 
loader,addr=0xfd1a0104,data=0x800e,data-len=4

Images can be loaded like this: -device loader,file=./images/u-boot.elf,cpu=0

This can be useful and we use it a lot in Xilinx to load multiple images
into a machine at creation (ATF, Kernel and DTB for example).

It can also be used to set registers.

This patch series also make the load_elf() function more generic by not
requiring an architecture.

V4:
 - Re-write documentation
 - Allow the loader to work with every architecture
 - Move the file to hw/core
 - Increase the maximum number of CPUs
 - Make the CPU operations conditional
 - Convert the cpu option to cpu-num
 - Require the user to specify endianess
V2:
 - Add an entry to the maintainers file
 - Add some documentation
 - Perform bounds checking on the data_len
 - Register and unregister the reset in the realise/unrealise
Changes since RFC:
 - Add support for BE


Alistair Francis (3):
  loader: Allow ELF loader to auto-detect the ELF arch
  generic-loader: Add a generic loader
  docs: Add a generic loader explanation document

 MAINTAINERS  |   6 ++
 docs/generic-loader.txt  |  56 +++
 hw/core/Makefile.objs|   2 +
 hw/core/generic-loader.c | 144 +++
 hw/core/loader.c |  10 +++
 include/hw/core/generic-loader.h |  45 
 6 files changed, 263 insertions(+)
 create mode 100644 docs/generic-loader.txt
 create mode 100644 hw/core/generic-loader.c
 create mode 100644 include/hw/core/generic-loader.h

-- 
2.5.0

[Qemu-devel] [PATCH v4 3/3] docs: Add a generic loader explanation document

2016-03-03 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
V4:
 - Re-write to be more comprehensive

 docs/generic-loader.txt | 56 +
 1 file changed, 56 insertions(+)
 create mode 100644 docs/generic-loader.txt

diff --git a/docs/generic-loader.txt b/docs/generic-loader.txt
new file mode 100644
index 000..b748f2a
--- /dev/null
+++ b/docs/generic-loader.txt
@@ -0,0 +1,56 @@
+Copyright (c) 2016 Xilinx Inc.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.  See
+the COPYING file in the top-level directory.
+
+
+The 'loader' device allows the user to load multiple images or values into
+QEMU at startup.
+
+Loading Memory Values
+---
+The loader device allows memory values to be set from the command line. This
+can be done by following the syntax below:
+
+-device loader,addr=,data=,data-len=
+-device loader,addr=,cpu-num=
+
+NOTE: The loader device supports other options (see the next section) but they
+  do not apply to setting memory values and will be ignored.
+  It is also possible to mix the commands above, e.g. include the cpu-num
+  argument with the data argument.
+
+  - The address to store the data or the value to set the CPUs PC
+  - The value to be written to the addr. The maximum size of the
+  data is 8 bytes.
+  - The length of the data in bytes. This argument must be 
included
+  if the data argument is.
+   - Set to true if the data to be stored on the guest should be
+  written as big endian data. The default is to write little
+  endian data.
+   - This will cause the CPU to be reset and the PC to be set to
+  the value of addr.
+
+An example of loading value 0x800e to address 0xfd1a0104 is:
+-device loader,addr=0xfd1a0104,data=0x800e,data-len=4
+
+Loading Files
+---
+The loader device also allows files to be loaded into memory. This can be done
+similarly to setting memory values. The syntax is shown below:
+
+-device loader,file=,addr=,cpu-num=,force-raw=
+
+  - A file to be loaded into memory
+  - The addr in memory that the file should be loaded. This is
+  ignored if you are using an ELF (unless force-raw is true).
+  This is requried if you aren't loading an ELF.
+   - This specifices the CPU that should be used. This is an
+  optional argument and will cause the CPU's PC to be set to
+  where the image is stored. This option should only be used
+  for the boot image.
+ - Forces the file to be treated as a raw image. This can be
+  used to specificy the load address of ELF files.
+
+An example of loading an ELF file which CPU0 will boot is shown below:
+-device loader,file=./images/boot.elf,cpu-num=0
-- 
2.5.0

[Qemu-devel] [PATCH v4 1/3] loader: Allow ELF loader to auto-detect the ELF arch

2016-03-03 Thread Alistair Francis

If the caller didn't specify an architecture for the ELF machine
the load_elf() function will auto detect it based on the ELF file.

Signed-off-by: Alistair Francis 
---

 hw/core/loader.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/core/loader.c b/hw/core/loader.c
index 3a57415..3a50771 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -339,6 +339,7 @@ int load_elf(const char *filename, uint64_t 
(*translate_fn)(void *, uint64_t),
 {
 int fd, data_order, target_data_order, must_swab, ret = ELF_LOAD_FAILED;
 uint8_t e_ident[EI_NIDENT];
+uint16_t e_machine;
 
 fd = open(filename, O_RDONLY | O_BINARY);
 if (fd < 0) {
@@ -371,6 +372,15 @@ int load_elf(const char *filename, uint64_t 
(*translate_fn)(void *, uint64_t),
 goto fail;
 }
 
+if (elf_machine < 1) {
+/* The caller didn't specify and ARCH, we can figure it out */
+lseek(fd, 0x12, SEEK_SET);
+if (read(fd, _machine, sizeof(e_machine)) != sizeof(e_machine)) {
+goto fail;
+}
+elf_machine = e_machine;
+}
+
 lseek(fd, 0, SEEK_SET);
 if (e_ident[EI_CLASS] == ELFCLASS64) {
 ret = load_elf64(filename, fd, translate_fn, translate_opaque, 
must_swab,
-- 
2.5.0

[Qemu-devel] [RFC] host and guest kernel trace merging

2016-03-03 Thread Luiz Capitulino


Very recently, trace-cmd got a few new features that allow you
to merge the host and guest kernel traces using the host TSC.

Those features originated in the tracing we're doing to debug spikes
in real-time KVM. However, as real-time KVM uses a very specific
setup and as we have so far debugged a very simple application,
I'm wondering: is this feature useful for the general, non-realtime,
use-cases?

If the answer is yes, then I've got several ideas on how to
make host and guest trace merging extremely simple to use.

I'll first describe how we do tracing for real-time KVM. Then
I'll give some suggestions on how to use the same procedure
for unpinned use-cases. Lastly, I'll talk about how we could
make it easy to use.

Real-time KVM host and guest tracing


In real-time KVM, each guest's vCPU is pinned to a different host
core. The real-time application running in the guest is also pinned.
When we get a spike, we know in which guest CPU it ocurred, and
so we know in which host core this CPU was running. All we have to
do is to get a trace of that guest CPU/host core pair.

1. Setup


You'll need the following:

 1. A stable TSC
 2. The TSC offset of the guest you want to debug
(see below)
 3. Have your guest transfer a file to your
host someway (I use networking)
 4. Latest trace-cmd.git in both host and guest
(HEAD c21aae2c or later)

To get the TSC offset of the guest, you can use the kvm_write_tsc_offset
tracepoint in the host. I use this script to do it:

 http://people.redhat.com/~lcapitul/real-time/start-debug-guest

Yes, it sucks. I have an idea on how to improve this (keep reading).

2. Tracing
--

In summary, what you have to do is:

 1. run trace-cmd start -C x86-tsc in the host
 2. run trace-cmd record -C x86-tsc in the guest
 3. run trace-cmd stop in the host
 4. run trace-cmd extract in the host
 4. copy the guest's trace.dat file to a known
location in the host

This guest script does all that:

 http://people.redhat.com/~lcapitul/real-time/trace-host-and-guest

I run it like this:

 # trace-host-and-guest cyclictest -q -n -b10 --notrace

3. Merging
--

Merging is simple:

 $ trace-cmd report -i host-trace.dat --ts-offset $((GUEST-TSC-offset)) \
-i guest-trace.dat

For real-time KVM, we also want to see the difference in nanoseconds
of each line in the trace so we use additional options:

 $ trace-cmd report --ts-diff --ts2secs HOST-Hz -t \
-i host-trace.dat --ts-offset $((GUEST-TSC-offset)) \
-i guest-trace.dat

Here's a real example:

 $ trace-cmd report --ts-diff --ts2secs 260 -t \
-i host-trace.dat --ts-offset $((18443676333795429734)) \
-i guest-trace.dat

And here's a little extract of a merged trace where the host injects
a timer interrupt, the guest handles it and reprograms the next
hrtimer timer. The value in "()" is how many nanoseconds it took
between the previous and the following line:

 host-trace.dat: qemu-kvm-3699  [004] [004]  6196.749398857: (+88)
function: kvm_inject_pending_timer_irqs <-- kvm_arch_vcpu_ioctl_run
 host-trace.dat: qemu-kvm-3699  [004] [004]  6196.749398990: (+133)   
kvm_entry:vcpu 0
guest-trace.dat:   -0 [000] [000]  6196.749399096: (+106)   
function: hrtimer_interrupt <-- local_apic_timer_interrupt
guest-trace.dat:   -0 [000] [000]  6196.749399123: (+27)
function: hrtimer_wakeup <-- __run_hrtimer
guest-trace.dat:   -0 [000] [000]  6196.749399183: (+60)
function: tick_program_event <-- hrtimer_interrupt
 host-trace.dat: qemu-kvm-3699  [004] [004]  6196.749399219: (+36)
kvm_exit: reason MSR_WRITE rip 0x8104bf58 info 0 0
 host-trace.dat: qemu-kvm-3699  [004] [004]  6196.749399260: (+41)
function: kvm_set_lapic_tscdeadline_msr <-- kvm_set_msr_common
 host-trace.dat: qemu-kvm-3699  [004] [004]  6196.749399283: (+23)
function: hrtimer_start <-- start_apic_timer
 host-trace.dat: qemu-kvm-3699  [004] [004]  6196.749399336: (+53)
kvm_entry:vcpu 0

Unpinned use-cases
==

If you can't pin the guest vCPU threads and the guest application like
we do in real-time KVM, you could try the following:

 * If your guest has a single CPU, or you want to trace a
   specific guest vCPU then try to pass -P vCPU-TID when
   running "trace-cmd record start" in the host

 * If you want to trace multiple vCPUs, I think you could
   try to trace all cores where the vCPUs could run with -M.
   Then you could try to merge this with the guest trace and
   see if you get a single timeline of all cores and guests CPUs

trace-cmd-server


Everything I described could look like this:

  # trace-cmd server [ in

Re: [Qemu-devel] [PATCH 4/7] target-i386: Dump illegal opcodes with -d unimp

2016-03-03 Thread Richard Henderson


On 03/03/2016 02:08 AM, Paolo Bonzini wrote:

Do you want LOG_UNIMP or LOG_GUEST_ERROR?


I would actually use LOG_IN_ASM.  As you noticed, guests sometimes use
illegal opcodes; another example is Xen's hypercall interface.

On 03/03/2016 07:57, Hervé Poussineau wrote:

This patch is not quiet on some operating systems:
OS/2:
ILLOPC: 000172e1: 0f a6

Windows XP:
ILLOPC: 00020d1a: c4 c4

And very verbose in Windows 3.11, Windows 9x:
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63

Is it normal?


Yes, it is.  As usual, Raymond Chen explains what's going on:

https://blogs.msdn.microsoft.com/oldnewthing/20041215-00/?p=37003


Wow.  That's... interesting.

I think maybe I'll re-do the patch to distinguish between those opcodes that 
are completely unrecognized (which is what I was expecting to find) and those 
that raise #UD due to cpu state (e.g. this arpl in vm86 mode).



r~

[Qemu-devel] [PATCH] Fix bug: SRS instructions would trap to EL3 in Secure EL1 even if specified mode was not monitor mode. [RESUBMIT DUE TO MISSING SIGN-OFF]

2016-03-03 Thread Ralf-Philipp Weinmann

According to the ARMv8 Architecture reference manual [F6.1.203], ALL
of the following conditions need to be met for SRS to trap to EL3:
* It is executed at Secure PL1.
* The specified mode is monitor mode.
* EL3 is using AArch64.

Signed-off-by: Ralf-Philipp Weinmann 
---
 target-arm/translate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index c29c47f..a7688bb 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7582,7 +7582,8 @@ static void gen_srs(DisasContext *s,
 bool undef = false;
 
 /* SRS is:
- * - trapped to EL3 if EL3 is AArch64 and we are at Secure EL1
+ * - trapped to EL3 if EL3 is AArch64 and we are at Secure EL1 and 
+ *   mode is monitor mode
  * - UNDEFINED in Hyp mode
  * - UNPREDICTABLE in User or System mode
  * - UNPREDICTABLE if the specified mode is:
@@ -7592,7 +7593,7 @@ static void gen_srs(DisasContext *s,
  * -- Monitor, if we are Non-secure
  * For the UNPREDICTABLE cases we choose to UNDEF.
  */
-if (s->current_el == 1 && !s->ns) {
+if (s->current_el == 1 && !s->ns && mode == ARM_CPU_MODE_MON) {
 gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(), 3);
 return;
 }
-- 
2.5.4 (Apple Git-61)

[Qemu-devel] [PATCH] Fix bug: SRS instructions would trap to EL3 in Secure EL1 even if specified mode was not monitor mode.

2016-03-03 Thread Ralf-Philipp Weinmann

According to the ARMv8 Architecture reference manual [F6.1.203], ALL
of the following conditions need to be met for SRS to trap to EL3:
* It is executed at Secure PL1.
* The specified mode is monitor mode.
* EL3 is using AArch64.
---
 target-arm/translate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index c29c47f..a7688bb 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7582,7 +7582,8 @@ static void gen_srs(DisasContext *s,
 bool undef = false;
 
 /* SRS is:
- * - trapped to EL3 if EL3 is AArch64 and we are at Secure EL1
+ * - trapped to EL3 if EL3 is AArch64 and we are at Secure EL1 and 
+ *   mode is monitor mode
  * - UNDEFINED in Hyp mode
  * - UNPREDICTABLE in User or System mode
  * - UNPREDICTABLE if the specified mode is:
@@ -7592,7 +7593,7 @@ static void gen_srs(DisasContext *s,
  * -- Monitor, if we are Non-secure
  * For the UNPREDICTABLE cases we choose to UNDEF.
  */
-if (s->current_el == 1 && !s->ns) {
+if (s->current_el == 1 && !s->ns && mode == ARM_CPU_MODE_MON) {
 gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(), 3);
 return;
 }
-- 
2.5.4 (Apple Git-61)

[Qemu-devel] [PATCH] linux-user: Consistently return host errnos from do_openat()

2016-03-03 Thread Peter Maydell

The function do_openat() is not consistent about whether it is
returning a host errno or a guest errno in case of failure.
Standardise on returning -1 with errno set (ie caller has
to call get_errno()).

Signed-off-by: Peter Maydell 
Reported-by: Timothy Edward Baldwin 
---
Timothy's patchset for fixing signal races had a patch which also
addressed this bug.  However I preferred to take the opposite tack
and have the callers do get_errno() rather than the callee, because
it means changes in fewer places and it's generally more natural
for the 'fill' functions that do_openat() calls.
---
 linux-user/syscall.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index f9dcdd4..39586e4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5557,7 +5557,9 @@ static int open_self_cmdline(void *cpu_env, int fd)
 
 nb_read = read(fd_orig, buf, sizeof(buf));
 if (nb_read < 0) {
+int e = errno;
 fd_orig = close(fd_orig);
+errno = e;
 return -1;
 } else if (nb_read == 0) {
 break;
@@ -5577,7 +5579,9 @@ static int open_self_cmdline(void *cpu_env, int fd)
 
 if (word_skipped) {
 if (write(fd, cp_buf, nb_read) != nb_read) {
+int e = errno;
 close(fd_orig);
+errno = e;
 return -1;
 }
 }
@@ -5597,7 +5601,7 @@ static int open_self_maps(void *cpu_env, int fd)
 
 fp = fopen("/proc/self/maps", "r");
 if (fp == NULL) {
-return -EACCES;
+return -1;
 }
 
 while ((read = getline(, , fp)) != -1) {
@@ -5741,7 +5745,7 @@ static int open_net_route(void *cpu_env, int fd)
 
 fp = fopen("/proc/net/route", "r");
 if (fp == NULL) {
-return -EACCES;
+return -1;
 }
 
 /* read header */
@@ -5791,7 +5795,7 @@ static int do_openat(void *cpu_env, int dirfd, const char 
*pathname, int flags,
 
 if (is_proc_myself(pathname, "exe")) {
 int execfd = qemu_getauxval(AT_EXECFD);
-return execfd ? execfd : get_errno(sys_openat(dirfd, exec_path, flags, 
mode));
+return execfd ? execfd : sys_openat(dirfd, exec_path, flags, mode);
 }
 
 for (fake_open = fakes; fake_open->filename; fake_open++) {
@@ -5817,7 +5821,9 @@ static int do_openat(void *cpu_env, int dirfd, const char 
*pathname, int flags,
 unlink(filename);
 
 if ((r = fake_open->fill(cpu_env, fd))) {
+int e = errno;
 close(fd);
+errno = e;
 return r;
 }
 lseek(fd, 0, SEEK_SET);
@@ -5825,7 +5831,7 @@ static int do_openat(void *cpu_env, int dirfd, const char 
*pathname, int flags,
 return fd;
 }
 
-return get_errno(sys_openat(dirfd, path(pathname), flags, mode));
+return sys_openat(dirfd, path(pathname), flags, mode);
 }
 
 #define TIMER_MAGIC 0x0caf
-- 
1.9.1

[Qemu-devel] [PATCH] linux-user: Check array bounds in errno conversion

2016-03-03 Thread Peter Maydell

From: Timothy E Baldwin 

Check array bounds in host_to_target_errno() and target_to_host_errno().

Signed-off-by: Timothy Edward Baldwin 
Message-id: 1441497448-32489-2-git-send-email-t.e.baldwi...@members.leeds.ac.uk
[PMM: Add a lower-bound check, use braces on if(), tweak commit message]
Signed-off-by: Peter Maydell 
---
This is a bugfix patch fished out of Timothy's signal-race-fixes
patch series. We had a previous go-around doing this with unsigned
integers, but that doesn't work.

 linux-user/syscall.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 9517531..f9dcdd4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -617,15 +617,19 @@ static uint16_t 
host_to_target_errno_table[ERRNO_TABLE_SIZE] = {
 
 static inline int host_to_target_errno(int err)
 {
-if(host_to_target_errno_table[err])
+if (err >= 0 && err < ERRNO_TABLE_SIZE &&
+host_to_target_errno_table[err]) {
 return host_to_target_errno_table[err];
+}
 return err;
 }
 
 static inline int target_to_host_errno(int err)
 {
-if (target_to_host_errno_table[err])
+if (err >= 0 && err < ERRNO_TABLE_SIZE &&
+target_to_host_errno_table[err]) {
 return target_to_host_errno_table[err];
+}
 return err;
 }
 
-- 
1.9.1

Re: [Qemu-devel] [PATCH RFC v2 1/2] Add param Error** to msi_init() & modify the callers

2016-03-03 Thread Michael S. Tsirkin

On Thu, Mar 03, 2016 at 04:03:16PM +0100, Markus Armbruster wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Thu, Mar 03, 2016 at 01:19:09PM +0200, Marcel Apfelbaum wrote:
> >> On 03/03/2016 12:45 PM, Michael S. Tsirkin wrote:
> >> >On Thu, Mar 03, 2016 at 12:12:27PM +0200, Marcel Apfelbaum wrote:
> >> +int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int 
> >> nr_vectors,
> >> + bool msi64bit, bool msi_per_vector_mask, Error **errp)
> >>   {
> >>   unsigned int vectors_order;
> >> -uint16_t flags;
> >> +uint16_t flags; /* Message Control register value */
> >>   uint8_t cap_size;
> >>   int config_offset;
> >> 
> >>   if (!msi_supported) {
> >> +error_setg(errp, "MSI is not supported by interrupt 
> >> controller");
> >>   return -ENOTSUP;
> >> >>>
> >> >>>First failure mode: board does not support MSI (-ENOTSUP).
> >> >>>
> >> >>>Question to the PCI guys: why is this even an error?  A device with
> >> >>>capability MSI should work just fine in such a board.
> >> >>
> >> >>Hi Markus,
> >> >>
> >> >>Adding Jan Kiszka, maybe he can help.
> >> >>
> >> >>That's a fair question. Is there any history for this decision?
> >> >>The board not supporting MSI has nothing to do with the capability being 
> >> >>there.
> >> >>The HW should not change because the board doe not support it.
> >> >>
> >> >>The capability should be present but not active.
> >> >
> >> >Digging in git log will tell you why we have the msi_supported flag:
> >> >
> >> >commit 02eb84d0ec97f183ac23ee939403a139e8849b1d ("qemu/pci: MSI-X support 
> >> >functions")
> >> >
> >> >  This is a safety measure to avoid breaking platforms which should 
> >> > support
> >> >  MSI-X but currently lack this in the interrupt controller emulation.
> >> >
> >> >in other words, on some platforms we must hide msi support from devices
> >> >because otherwise guests will try to use it, and our emulation is
> >> >incomplete.
> >> 
> >> 
> >> OK, thanks. So the flag should be "msi_broken" or 
> >> "msi_present_but_not_implemented" and not
> >> "msi_supported" that leads (at least me) to the assumption that some 
> >> platform *does not support msi*
> >> rather than it supports it, but we don't emulate it.
> 
> I agree the name is badly misleading for this role.
> 
> Now let me see how this contraption actually works.  msi_supported is
> global, initialized to false, and becomes globally true when
> 
> 1. certain MSI-capable interrupt controllers realize: "apic",
>   "kvm-apic" if kvm_has_gsi_routing(), "xen-apic", "arm-gicv2m",
>   "openpic" models 1 and 2, "kvm-openpic" models 1 and 2
> 
> 2. "s390-pcihost" class-initializes
> 
> 3. machine "spapr-machine" initializes
> 
> Issues:
> 
> * "Global is problematic.  What if a board has more than one interrupt
>   controller?  What if one of them sets msi_supported, but the other one
>   is of the kind Michael described, i.e. guests know it has MSI, but our
>   emulation doesn't actually work?
> 
> * "Initialize to false" is problematic.  We don't clear msi_supported
>   when we have a broken interrupt controler, we set it when we have a
>   working one.  The consequence is that boards with non-MSI interrupt
>   controllers are treated just like boards with broken interrupt
>   controllers.
> 
>   Here's  how msi_supported is documented:
> 
> /* Flag for interrupt controller to declare MSI/MSI-X support */
> bool msi_supported;
> 
>   This is matches how the code works.  However, it contradicts the
>   commit message Michael quoted.  The most plausible explanation is that
>   the commit is flawed.
> 
> * Class-initialize (2.) looks wrong to me.  msi_supported becomes true
>   when QOM type "s390-pcihost" is created, regardless of whether
>   instances get created, let alone used.
> 
> * I'm not sure about 3., but the spapr guys can worry about that.
> 
> >> >And the conclusion from that is that for msi_init to fail silently is
> >> >at the moment the right thing to do.
> >> 
> >> But this is not the only thing we do, we are modifying the PCI devices. We 
> >> could fail starting the VM
> >> if a device supporting MSI is added on a platform with broken msi, but 
> >> this will prevent us to use
> >> assigned devices. Emulated devices should be created with a specific 
> >> "msi=off" flag.
> >> 
> >> Thanks,
> >> Marcel
> >
> > That will just break a bunch of valid configurations, for no real
> > benefit to users.
> 
> I disagree, strongly.
> 
> If I ask for msi=on, then QEMU should either give it to me or fail, not
> silently "correct" my configuration.
> 
> Of course, if we have msi_supported = false for everybody and its dog
> whether it's really needed or not, "or fail" will indeed reject "a bunch
> of valid configurations".  We "compensate" by silently messing with the
> user's configuration.  That's shoddy workmanship, sorry.
> 
> We should instead have msi_supported =

Re: [Qemu-devel] [PATCH v8 0/4] i386: expose floppy-related objects in SSDT

2016-03-03 Thread Michael S. Tsirkin

On Thu, Mar 03, 2016 at 06:48:38PM +0300, Roman Kagan wrote:
> On Wed, Mar 02, 2016 at 05:10:58PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Mar 02, 2016 at 06:08:41PM +0300, Denis V. Lunev wrote:
> > > On 02/17/2016 09:25 PM, Roman Kagan wrote:
> > > >Windows on UEFI systems is only capable of detecting the presence and
> > > >the type of floppy drives via corresponding ACPI objects.
> > > >
> > > >Those objects are added in patch 4; the preceding ones pave the way to
> > > >it, by making the necessary data public and by moving the whole floppy
> > > >drive controller description into runtime-generated SSDT.
> > > >
> > > >Roman Kagan (4):
> > > >   i386/acpi: make floppy controller object dynamic
> > > >   i386: expose floppy drive CMOS type
> > > >   fdc: add function to determine drive chs limits
> > > >   i386: populate floppy drive information in DSDT
> > > >
> > > >Signed-off-by: Roman Kagan 
> > > >Cc: Igor Mammedov 
> > > >Cc: "Michael S. Tsirkin" 
> > > >Cc: Marcel Apfelbaum 
> > > >Cc: John Snow 
> > > >Cc: Laszlo Ersek 
> > > >Cc: Kevin O'Connor 
> > > >---
> > > >changes since v7:
> > > >  - rebased to latest master
> > > >  - use drive max c,h,s rather than the current diskette geometry
> > > >
> > > >  hw/block/fdc.c | 23 +
> > > >  hw/i386/acpi-build.c   | 92 
> > > > --
> > > >  hw/i386/pc.c   |  2 +-
> > > >  include/hw/block/fdc.h |  2 ++
> > > >  include/hw/i386/pc.h   |  1 +
> > > >  5 files changed, 94 insertions(+), 26 deletions(-)
> > > >
> > > Michael, we have obtained Reviwed-by: from John.
> > > Does this set is good to be accepted or your
> > > last comment is mandatory?
> > 
> > Pls do but you can make it a separate patch on top
> > if you prefer.
> 
> Sorry I must have lost the track: I thought that all your concerns had
> been addressed by John's comment.  Can you please point out what issues
> still remain in this patchset that prevent it from being merged?
> 
> Thanks,
> Roman.

it's in my tree so nothing.
I prefer refactoring the loop slightly, by patch on top.

[Qemu-devel] [PATCH v2 5/5] bcm2835_dma: add emulation of Raspberry Pi DMA controller

2016-03-03 Thread Andrew Baumann

At present, all DMA transfers complete inline (so a looping descriptor
queue will lock up the device). We also do not model pause/abort,
arbitrarion/priority, or debug features.

Signed-off-by: Andrew Baumann 
---

Notes:
v2:
 * avoid ldl_phys/stl_phys
 * compute address of channel structure only after asserting its validity
 * correctly implement per-channel reset and abort bits
 * set channel paused bit when completing DMA

 hw/arm/bcm2835_peripherals.c |  26 +++
 hw/dma/Makefile.objs |   1 +
 hw/dma/bcm2835_dma.c | 408 +++
 include/hw/arm/bcm2835_peripherals.h |   2 +
 include/hw/dma/bcm2835_dma.h |  47 
 5 files changed, 484 insertions(+)
 create mode 100644 hw/dma/bcm2835_dma.c
 create mode 100644 include/hw/dma/bcm2835_dma.h

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index 9b9de99..eeb4934 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -89,6 +89,14 @@ static void bcm2835_peripherals_init(Object *obj)
 object_initialize(>sdhci, sizeof(s->sdhci), TYPE_SYSBUS_SDHCI);
 object_property_add_child(obj, "sdhci", OBJECT(>sdhci), NULL);
 qdev_set_parent_bus(DEVICE(>sdhci), sysbus_get_default());
+
+/* DMA Channels */
+object_initialize(>dma, sizeof(s->dma), TYPE_BCM2835_DMA);
+object_property_add_child(obj, "dma", OBJECT(>dma), NULL);
+qdev_set_parent_bus(DEVICE(>dma), sysbus_get_default());
+
+object_property_add_const_link(OBJECT(>dma), "dma-mr",
+   OBJECT(>gpu_bus_mr), _abort);
 }
 
 static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
@@ -257,6 +265,24 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
+/* DMA Channels */
+object_property_set_bool(OBJECT(>dma), true, "realized", );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+memory_region_add_subregion(>peri_mr, DMA_OFFSET,
+sysbus_mmio_get_region(SYS_BUS_DEVICE(>dma), 0));
+memory_region_add_subregion(>peri_mr, DMA15_OFFSET,
+sysbus_mmio_get_region(SYS_BUS_DEVICE(>dma), 1));
+
+for (n = 0; n <= 12; n++) {
+sysbus_connect_irq(SYS_BUS_DEVICE(>dma), n,
+   qdev_get_gpio_in_named(DEVICE(>ic),
+  BCM2835_IC_GPU_IRQ,
+  INTERRUPT_DMA0 + n));
+}
 }
 
 static void bcm2835_peripherals_class_init(ObjectClass *oc, void *data)
diff --git a/hw/dma/Makefile.objs b/hw/dma/Makefile.objs
index 0e65ed0..a1abbcf 100644
--- a/hw/dma/Makefile.objs
+++ b/hw/dma/Makefile.objs
@@ -11,3 +11,4 @@ common-obj-$(CONFIG_SUN4M) += sun4m_iommu.o
 
 obj-$(CONFIG_OMAP) += omap_dma.o soc_dma.o
 obj-$(CONFIG_PXA2XX) += pxa2xx_dma.o
+obj-$(CONFIG_RASPI) += bcm2835_dma.o
diff --git a/hw/dma/bcm2835_dma.c b/hw/dma/bcm2835_dma.c
new file mode 100644
index 000..c7ce4e4
--- /dev/null
+++ b/hw/dma/bcm2835_dma.c
@@ -0,0 +1,408 @@
+/*
+ * Raspberry Pi emulation (c) 2012 Gregory Estrade
+ * This code is licensed under the GNU GPLv2 and later.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/dma/bcm2835_dma.h"
+
+/* DMA CS Control and Status bits */
+#define BCM2708_DMA_ACTIVE  (1 << 0)
+#define BCM2708_DMA_END (1 << 1) /* GE */
+#define BCM2708_DMA_INT (1 << 2)
+#define BCM2708_DMA_ISPAUSED(1 << 4)  /* Pause requested or not active */
+#define BCM2708_DMA_ISHELD  (1 << 5)  /* Is held by DREQ flow control */
+#define BCM2708_DMA_ERR (1 << 8)
+#define BCM2708_DMA_ABORT   (1 << 30) /* stop current CB, go to next, WO */
+#define BCM2708_DMA_RESET   (1 << 31) /* WO, self clearing */
+
+/* DMA control block "info" field bits */
+#define BCM2708_DMA_INT_EN  (1 << 0)
+#define BCM2708_DMA_TDMODE  (1 << 1)
+#define BCM2708_DMA_WAIT_RESP   (1 << 3)
+#define BCM2708_DMA_D_INC   (1 << 4)
+#define BCM2708_DMA_D_WIDTH (1 << 5)
+#define BCM2708_DMA_D_DREQ  (1 << 6)
+#define BCM2708_DMA_D_IGNORE(1 << 7)
+#define BCM2708_DMA_S_INC   (1 << 8)
+#define BCM2708_DMA_S_WIDTH (1 << 9)
+#define BCM2708_DMA_S_DREQ  (1 << 10)
+#define BCM2708_DMA_S_IGNORE(1 << 11)
+
+/* Register offsets */
+#define BCM2708_DMA_CS  0x00 /* Control and Status */
+#define BCM2708_DMA_ADDR0x04 /* Control block address */
+/* the current control block appears in the following registers - read only */
+#define BCM2708_DMA_INFO0x08
+#define BCM2708_DMA_SOURCE_AD   0x0c
+#define BCM2708_DMA_DEST_AD 0x10
+#define BCM2708_DMA_TXFR_LEN0x14
+#define BCM2708_DMA_STRIDE  0x18
+#define BCM2708_DMA_NEXTCB  0x1C
+#define BCM2708_DMA_DEBUG   0x20
+
+#define BCM2708_DMA_INT_STATUS  0xfe0 /* Interrupt status of each channel */
+#define BCM2708_DMA_ENABLE  0xff0 /* Global enable bits for

[Qemu-devel] [PATCH v2 4/5] bcm2835_property: implement framebuffer control/configuration properties

2016-03-03 Thread Andrew Baumann

The property channel driver now interfaces with the framebuffer device
to query and set framebuffer parameters. As a result of this, the "get
ARM RAM size" query now correctly returns the video RAM base address
(not total RAM size), and the ram-size property is no longer relevant
here.

Signed-off-by: Andrew Baumann 
---

Notes:
v2:
 * avoid ldl/stl_phys
 * move code to increase default pi2 memory size from preceding patch here
   (it was incorrect without the property channel implementation changes)

 hw/arm/bcm2835_peripherals.c   |   8 +--
 hw/arm/raspi.c |   7 +-
 hw/misc/bcm2835_property.c | 139 -
 include/hw/misc/bcm2835_property.h |   5 +-
 4 files changed, 144 insertions(+), 15 deletions(-)

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index 552611a..9b9de99 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -80,6 +80,8 @@ static void bcm2835_peripherals_init(Object *obj)
   "board-rev", _abort);
 qdev_set_parent_bus(DEVICE(>property), sysbus_get_default());
 
+object_property_add_const_link(OBJECT(>property), "fb",
+   OBJECT(>fb), _abort);
 object_property_add_const_link(OBJECT(>property), "dma-mr",
OBJECT(>gpu_bus_mr), _abort);
 
@@ -210,12 +212,6 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
qdev_get_gpio_in(DEVICE(>mboxes), MBOX_CHAN_FB));
 
 /* Property channel */
-object_property_set_int(OBJECT(>property), ram_size, "ram-size", );
-if (err) {
-error_propagate(errp, err);
-return;
-}
-
 object_property_set_bool(OBJECT(>property), true, "realized", );
 if (err) {
 error_propagate(errp, err);
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 5498209..83fe809 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -164,11 +164,6 @@ static void raspi2_machine_init(MachineClass *mc)
 mc->no_floppy = 1;
 mc->no_cdrom = 1;
 mc->max_cpus = BCM2836_NCPUS;
-
-/* XXX: Temporary restriction in RAM size from the full 1GB. Since
- * we do not yet support the framebuffer / GPU, we need to limit
- * RAM usable by the OS to sit below the peripherals.
- */
-mc->default_ram_size = 0x3F00; /* BCM2836_PERI_BASE */
+mc->default_ram_size = 1024 * 1024 * 1024;
 };
 DEFINE_MACHINE("raspi2", raspi2_machine_init)
diff --git a/hw/misc/bcm2835_property.c b/hw/misc/bcm2835_property.c
index 41fbbe3..15dcc02 100644
--- a/hw/misc/bcm2835_property.c
+++ b/hw/misc/bcm2835_property.c
@@ -17,6 +17,11 @@ static void bcm2835_property_mbox_push(BCM2835PropertyState 
*s, uint32_t value)
 uint32_t tot_len;
 size_t resplen;
 uint32_t tmp;
+int n;
+uint32_t offset, length, color;
+uint32_t xres, yres, xoffset, yoffset, bpp, pixo, alpha;
+uint32_t *newxres = NULL, *newyres = NULL, *newxoffset = NULL,
+*newyoffset = NULL, *newbpp = NULL, *newpixo = NULL, *newalpha = NULL;
 
 value &= ~0xf;
 
@@ -60,7 +65,14 @@ static void bcm2835_property_mbox_push(BCM2835PropertyState 
*s, uint32_t value)
 /* base */
 stl_le_phys(>dma_as, value + 12, 0);
 /* size */
-stl_le_phys(>dma_as, value + 16, s->ram_size);
+stl_le_phys(>dma_as, value + 16, s->fbdev->vcram_base);
+resplen = 8;
+break;
+case 0x00010006: /* Get VC memory */
+/* base */
+stl_le_phys(>dma_as, value + 12, s->fbdev->vcram_base);
+/* size */
+stl_le_phys(>dma_as, value + 16, s->fbdev->vcram_size);
 resplen = 8;
 break;
 case 0x00028001: /* Set power state */
@@ -122,6 +134,114 @@ static void 
bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value)
 resplen = 8;
 break;
 
+/* Frame buffer */
+
+case 0x00040001: /* Allocate buffer */
+stl_le_phys(>dma_as, value + 12, s->fbdev->base);
+stl_le_phys(>dma_as, value + 16, s->fbdev->size);
+resplen = 8;
+break;
+case 0x00048001: /* Release buffer */
+resplen = 0;
+break;
+case 0x00040002: /* Blank screen */
+resplen = 4;
+break;
+case 0x00040003: /* Get display width/height */
+case 0x00040004:
+stl_le_phys(>dma_as, value + 12, s->fbdev->xres);
+stl_le_phys(>dma_as, value + 16, s->fbdev->yres);
+resplen = 8;
+break;
+case 0x00044003: /* Test display width/height */
+case 0x00044004:
+resplen = 8;
+break;
+case 0x00048003: /* Set display width/height */
+case 0x00048004:
+xres = ldl_le_phys(>dma_as, value + 12);
+newxres = 
+

[Qemu-devel] [PATCH v2 2/5] bcm2835_aux: add emulation of BCM2835 AUX (aka UART1) block

2016-03-03 Thread Andrew Baumann

At present only the core UART functions (data path for tx/rx) are
implemented, which is enough for UEFI to boot. The following
features/registers are unimplemented:
  * Line/modem control
  * Scratch register
  * Extra control
  * Baudrate
  * SPI interfaces

Signed-off-by: Andrew Baumann 
---

Notes:
v2:
 * document unimplemented features, log unimplemented register accesses
 * model read path as 8-bit
 * drop incorrect event (break detection) functionality
 * implement AUX_IRQ register
 * model interrupt enables as a uint8 rather than 2 bools
 * corrected bugs in implementation of IIR bits 0-2
 * use chardev prop rather than qemu_char_get_next_serial(); the
   soc-level (bcm2835_peripherals) handling is still a little messy
   however, because uart0 is a pl011 device which still calls
   qemu_char_get_next_serial()

 hw/arm/bcm2835_peripherals.c |  29 
 hw/char/Makefile.objs|   1 +
 hw/char/bcm2835_aux.c| 316 +++
 include/hw/arm/bcm2835_peripherals.h |   2 +
 include/hw/char/bcm2835_aux.h|  33 
 5 files changed, 381 insertions(+)
 create mode 100644 hw/char/bcm2835_aux.c
 create mode 100644 include/hw/char/bcm2835_aux.h

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index 6ce9cd1..375e341 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -12,6 +12,8 @@
 #include "hw/arm/bcm2835_peripherals.h"
 #include "hw/misc/bcm2835_mbox_defs.h"
 #include "hw/arm/raspi_platform.h"
+#include "sysemu/char.h"
+#include "sysemu/sysemu.h" /* for serial_hds */
 
 /* Peripheral base address on the VC (GPU) system bus */
 #define BCM2835_VC_PERI_BASE 0x7e00
@@ -48,6 +50,11 @@ static void bcm2835_peripherals_init(Object *obj)
 object_property_add_child(obj, "uart0", OBJECT(s->uart0), NULL);
 qdev_set_parent_bus(DEVICE(s->uart0), sysbus_get_default());
 
+/* AUX / UART1 */
+object_initialize(>aux, sizeof(s->aux), TYPE_BCM2835_AUX);
+object_property_add_child(obj, "aux", OBJECT(>aux), NULL);
+qdev_set_parent_bus(DEVICE(>aux), sysbus_get_default());
+
 /* Mailboxes */
 object_initialize(>mboxes, sizeof(s->mboxes), TYPE_BCM2835_MBOX);
 object_property_add_child(obj, "mbox", OBJECT(>mboxes), NULL);
@@ -79,6 +86,7 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 MemoryRegion *ram;
 Error *err = NULL;
 uint32_t ram_size;
+CharDriverState *chr;
 int n;
 
 obj = object_property_get_link(OBJECT(dev), "ram", );
@@ -131,6 +139,27 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 qdev_get_gpio_in_named(DEVICE(>ic), BCM2835_IC_GPU_IRQ,
INTERRUPT_UART));
 
+/* AUX / UART1 */
+/* XXX: pl011 (uart0) uses qemu_char_get_next_serial(), so at this point it
+ * should have claimed the first serial device (if one exists) */
+chr = serial_hds[1];
+if (chr == NULL) {
+chr = qemu_chr_new("bcm2835.uart1", "null", NULL);
+}
+qdev_prop_set_chr(DEVICE(>aux), "chardev", chr);
+
+object_property_set_bool(OBJECT(>aux), true, "realized", );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+memory_region_add_subregion(>peri_mr, UART1_OFFSET,
+sysbus_mmio_get_region(SYS_BUS_DEVICE(>aux), 0));
+sysbus_connect_irq(SYS_BUS_DEVICE(>aux), 0,
+qdev_get_gpio_in_named(DEVICE(>ic), BCM2835_IC_GPU_IRQ,
+   INTERRUPT_AUX));
+
 /* Mailboxes */
 object_property_set_bool(OBJECT(>mboxes), true, "realized", );
 if (err) {
diff --git a/hw/char/Makefile.objs b/hw/char/Makefile.objs
index 5931cc8..69a553c 100644
--- a/hw/char/Makefile.objs
+++ b/hw/char/Makefile.objs
@@ -16,6 +16,7 @@ obj-$(CONFIG_SH4) += sh_serial.o
 obj-$(CONFIG_PSERIES) += spapr_vty.o
 obj-$(CONFIG_DIGIC) += digic-uart.o
 obj-$(CONFIG_STM32F2XX_USART) += stm32f2xx_usart.o
+obj-$(CONFIG_RASPI) += bcm2835_aux.o
 
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_ser.o
 common-obj-$(CONFIG_ISA_DEBUG) += debugcon.o
diff --git a/hw/char/bcm2835_aux.c b/hw/char/bcm2835_aux.c
new file mode 100644
index 000..0394d11
--- /dev/null
+++ b/hw/char/bcm2835_aux.c
@@ -0,0 +1,316 @@
+/*
+ * BCM2835 (Raspberry Pi / Pi 2) Aux block (mini UART and SPI).
+ * Copyright (c) 2015, Microsoft
+ * Written by Andrew Baumann
+ * Based on pl011.c, copyright terms below:
+ *
+ * Arm PrimeCell PL011 UART
+ *
+ * Copyright (c) 2006 CodeSourcery.
+ * Written by Paul Brook
+ *
+ * This code is licensed under the GPL.
+ *
+ * At present only the core UART functions (data path for tx/rx) are
+ * implemented. The following features/registers are unimplemented:
+ *  - Line/modem control
+ *  - Scratch register
+ *  - Extra control
+ *  - Baudrate
+ *  - SPI interfaces
+ */
+
+#include "qemu/osdep.h"
+#include "hw/char/bcm2835_aux.h"
+

[Qemu-devel] [PATCH v2 0/5] Raspberry Pi framebuffer, DMA and Windows support

2016-03-03 Thread Andrew Baumann

This patch series adds support for the AUX (second UART), framebuffer
and DMA controller on Raspberry Pi 2, and enables booting Windows on
this device. As with the previous series, it is heavily based on the
original (out of tree) work of Gregory Estrade, Stefan Weil and others
to support Raspberry Pi 1.

After this series, it is possible to boot Windows by following the
instructions at https://github.com/0xabu/qemu/wiki. You also boot
Raspbian to the GUI using a command such as:

  qemu-system-arm -M raspi2 -kernel raspbian-boot/kernel7.img -sd
  2015-09-24-raspbian-jessie.img -append "rw earlyprintk loglevel=8
  console=ttyAMA0 dwc_otg.lpm_enable=0 root=/dev/mmcblk0p2 rootwait"
  -dtb raspbian-boot/bcm2709-rpi-2-b.dtb -serial stdio

I plan to add USB, and remaining timers / system devices in
future patch series, along with support for pi1 (bcm2835). In the
meantime, the complete code is available at https://github.com/0xabu/qemu

v2:
 * added DMA controller
 * revised per PMM's review feedback
 * rebased on top of the patch for fixing ldl_phys/stl_phys in raspi
   devices, presently in target-arm.next

Cheers,
Andrew

Andrew Baumann (5):
  bcm2835_peripherals: enable sdhci pending-insert quirk for raspberry
pi
  bcm2835_aux: add emulation of BCM2835 AUX (aka UART1) block
  bcm2835_fb: add framebuffer device for Raspberry Pi
  bcm2835_property: implement framebuffer control/configuration
properties
  bcm2835_dma: add emulation of Raspberry Pi DMA controller

 hw/arm/bcm2835_peripherals.c | 100 -
 hw/arm/bcm2836.c |   2 +
 hw/arm/raspi.c   |  12 +-
 hw/char/Makefile.objs|   1 +
 hw/char/bcm2835_aux.c| 316 ++
 hw/display/Makefile.objs |   1 +
 hw/display/bcm2835_fb.c  | 424 +++
 hw/dma/Makefile.objs |   1 +
 hw/dma/bcm2835_dma.c | 408 +
 hw/misc/bcm2835_property.c   | 139 +++-
 include/hw/arm/bcm2835_peripherals.h |   6 +
 include/hw/char/bcm2835_aux.h|  33 +++
 include/hw/display/bcm2835_fb.h  |  47 
 include/hw/dma/bcm2835_dma.h |  47 
 include/hw/misc/bcm2835_property.h   |   5 +-
 15 files changed, 1529 insertions(+), 13 deletions(-)
 create mode 100644 hw/char/bcm2835_aux.c
 create mode 100644 hw/display/bcm2835_fb.c
 create mode 100644 hw/dma/bcm2835_dma.c
 create mode 100644 include/hw/char/bcm2835_aux.h
 create mode 100644 include/hw/display/bcm2835_fb.h
 create mode 100644 include/hw/dma/bcm2835_dma.h

-- 
2.5.3

[Qemu-devel] [PATCH v2 3/5] bcm2835_fb: add framebuffer device for Raspberry Pi

2016-03-03 Thread Andrew Baumann

The framebuffer occupies the upper portion of memory (64MiB by
default), but it can only be controlled/configured via a system
mailbox or property channel (to be added by a subsequent patch).

Signed-off-by: Andrew Baumann 
---

Notes:
v2:
 * avoid ldl_phys
 * move code to increase default pi2 memory size back to the final patch

 hw/arm/bcm2835_peripherals.c |  38 +++-
 hw/arm/bcm2836.c |   2 +
 hw/arm/raspi.c   |   5 +-
 hw/display/Makefile.objs |   1 +
 hw/display/bcm2835_fb.c  | 424 +++
 include/hw/arm/bcm2835_peripherals.h |   2 +
 include/hw/display/bcm2835_fb.h  |  47 
 7 files changed, 517 insertions(+), 2 deletions(-)
 create mode 100644 hw/display/bcm2835_fb.c
 create mode 100644 include/hw/display/bcm2835_fb.h

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index 375e341..552611a 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -63,6 +63,16 @@ static void bcm2835_peripherals_init(Object *obj)
 object_property_add_const_link(OBJECT(>mboxes), "mbox-mr",
OBJECT(>mbox_mr), _abort);
 
+/* Framebuffer */
+object_initialize(>fb, sizeof(s->fb), TYPE_BCM2835_FB);
+object_property_add_child(obj, "fb", OBJECT(>fb), NULL);
+object_property_add_alias(obj, "vcram-size", OBJECT(>fb), "vcram-size",
+  _abort);
+qdev_set_parent_bus(DEVICE(>fb), sysbus_get_default());
+
+object_property_add_const_link(OBJECT(>fb), "dma-mr",
+   OBJECT(>gpu_bus_mr), _abort);
+
 /* Property channel */
 object_initialize(>property, sizeof(s->property), 
TYPE_BCM2835_PROPERTY);
 object_property_add_child(obj, "property", OBJECT(>property), NULL);
@@ -85,7 +95,7 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 Object *obj;
 MemoryRegion *ram;
 Error *err = NULL;
-uint32_t ram_size;
+uint32_t ram_size, vcram_size;
 CharDriverState *chr;
 int n;
 
@@ -173,6 +183,32 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 qdev_get_gpio_in_named(DEVICE(>ic), BCM2835_IC_ARM_IRQ,
INTERRUPT_ARM_MAILBOX));
 
+/* Framebuffer */
+vcram_size = (uint32_t)object_property_get_int(OBJECT(s), "vcram-size",
+   );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+object_property_set_int(OBJECT(>fb), ram_size - vcram_size,
+"vcram-base", );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+object_property_set_bool(OBJECT(>fb), true, "realized", );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+memory_region_add_subregion(>mbox_mr, MBOX_CHAN_FB << 
MBOX_AS_CHAN_SHIFT,
+sysbus_mmio_get_region(SYS_BUS_DEVICE(>fb), 0));
+sysbus_connect_irq(SYS_BUS_DEVICE(>fb), 0,
+   qdev_get_gpio_in(DEVICE(>mboxes), MBOX_CHAN_FB));
+
 /* Property channel */
 object_property_set_int(OBJECT(>property), ram_size, "ram-size", );
 if (err) {
diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index 0321439..89a6b35 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -42,6 +42,8 @@ static void bcm2836_init(Object *obj)
   _abort);
 object_property_add_alias(obj, "board-rev", OBJECT(>peripherals),
   "board-rev", _abort);
+object_property_add_alias(obj, "vcram-size", OBJECT(>peripherals),
+  "vcram-size", _abort);
 qdev_set_parent_bus(DEVICE(>peripherals), sysbus_get_default());
 }
 
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 6582279..5498209 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -113,6 +113,7 @@ static void setup_boot(MachineState *machine, int version, 
size_t ram_size)
 static void raspi2_init(MachineState *machine)
 {
 RasPiState *s = g_new0(RasPiState, 1);
+uint32_t vcram_size;
 DriveInfo *di;
 BlockBackend *blk;
 BusState *bus;
@@ -149,7 +150,9 @@ static void raspi2_init(MachineState *machine)
 qdev_prop_set_drive(carddev, "drive", blk, _fatal);
 object_property_set_bool(OBJECT(carddev), true, "realized", _fatal);
 
-setup_boot(machine, 2, machine->ram_size);
+vcram_size = object_property_get_int(OBJECT(>soc), "vcram-size",
+ _abort);
+setup_boot(machine, 2, machine->ram_size - vcram_size);
 }
 
 static void raspi2_machine_init(MachineClass *mc)
diff --git a/hw/display/Makefile.objs b/hw/display/Makefile.objs
index f0cf431..d99780e 100644
--- a/hw/display/Makefile.objs
+++ b/hw/display/Makefile.objs
@@ -27,6 +27,7 @@ endif
 obj-$(CONFIG_OMAP) += omap_dss.o
 obj-$(CONFIG_OMAP) +=

[Qemu-devel] [PATCH v2 1/5] bcm2835_peripherals: enable sdhci pending-insert quirk for raspberry pi

2016-03-03 Thread Andrew Baumann

Reviewed-by: Peter Maydell 
Signed-off-by: Andrew Baumann 
---
 hw/arm/bcm2835_peripherals.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index 6d66fa0..6ce9cd1 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -171,6 +171,13 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
+object_property_set_bool(OBJECT(>sdhci), true, "pending-insert-quirk",
+ );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
 object_property_set_bool(OBJECT(>sdhci), true, "realized", );
 if (err) {
 error_propagate(errp, err);
-- 
2.5.3

Re: [Qemu-devel] [PATCH] input-keymap.c: Add keypad equal and power keys

2016-03-03 Thread Peter Maydell

On 3 March 2016 at 17:55, Programmingkid  wrote:
>
> On Mar 3, 2016, at 10:49 AM, Gerd Hoffmann wrote:
>> Of course, when emulating a x86 guest with ps/2 keyboard you still run
>> into the problem that there might be no ps/2 scancode for certain keys.
>> But there is nothing we can do about that.  Inventing random scancodes
>> wouldn't make guests interpret them as expected ...

> I always thought the whole point to QKeyCode was to provide a
> platform neutral keycode implementation. Your description of it makes
> it sound like it is really just PS/2 keycodes under a different name.

No, it is a neutral keycode implementation. If the keyboard being emulated
is a PS/2 keyboard then obviously you can't actually send the guest
keycodes that don't exist in the PS/2 protocol. That's a limitation
of the hardware being emulated, not of QEMU.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] input-keymap.c: Add keypad equal and power keys

2016-03-03 Thread Programmingkid


On Mar 3, 2016, at 10:49 AM, Gerd Hoffmann wrote:

>  Hi,
> 
>>> number is modeled after pc scancodes, so you can't just pick random
>>> numbers.
>> 
>> Really? I thought the only requirement was each scancode had to be unique. 
> 
> No, it's not.  ps2 emulation assumes those codes are the real ones.
> 
>>> So, if there are no scancodes for the keys you want handle, you can now
>>> drop the scancodes from the workflow.
>> 
>> Are you saying not to add the power and keypad equal keys to the 
>> input-keymap.c  file?
> 
> If standard scancodes exist for them we can add them.  Needs some care
> though, ps2 keyboards have different modes and different keymaps in each
> mode.
> 
>>> Switch cocoa to generate and
>>> submit qkeycodes.
>> 
>> This is already done.
> 
> Good.
> 
>>>  Switch the apple keyboard(s) to accept qkeycodes (see
>>> yesterdays mail on adb keyboard).
>> 
>> On my to-do list.
> 
> Good.
> 
>>> Then the key events from the host
>>> keyboard are forwarded to the guest without ever being converted into pc
>>> scancodes.
>> 
>> How do I do this? You said to use qemu_input_event_send_key_qcode() to
>> send QKeyCodes to QEMU. Is this what you still want?
> 
> Yes.
> 
>> Eventually wouldn't the qcode_to_number array in input-keymap.c try to
>> translate the keypad equals to a ps/2 value?
> 
> Depends on what the emulated device is doing.
> 
> Devices which still use the old input interface to register a input
> handler will get scancodes (example: current adb code).
> 
> Devices which are switched over to the new input interface will receive
> InputEvent *evt.  Then they can then use either
> qemu_input_key_value_to_scancode() to translate the event into a
> sequence of scancodes (Example: ps2 keyboard).  Or they can use
> qemu_input_key_value_to_qcode() to get a qkeycode.
> 
> So, with cocoa using qemu_input_event_send_key_qcode() and adb using
> qemu_input_key_value_to_qcode() the keys are never translated into
> scancodes, and they'll work fine even without a scancode being assigned
> to them in the qcode <-> number (aka scancode) translation maps.
> 
>> If the keypad equals key isn't set in this array, the array might
>> return a default value of 0 and the user will see 'a' printed whenever
>> the keypad equals key is pushed.
> 
> Once the adb code is switched over to use qemu_input_key_value_to_qcode
> this will stop happening.
> 
> Of course, when emulating a x86 guest with ps/2 keyboard you still run
> into the problem that there might be no ps/2 scancode for certain keys.
> But there is nothing we can do about that.  Inventing random scancodes
> wouldn't make guests interpret them as expected ...

Well, the evidence would say otherwise. For the Macintosh keyboard's keypad 
equals key, we could make it so the other equals key is used in its place. The 
only time the user would have any issues is when he or she holds down the shift 
key and pushes the keypad equals button. But who would do that? Logically equal 
equals equal. 

I always thought the whole point to QKeyCode was to provide a platform neutral 
keycode implementation. Your description of it makes it sound like it is really 
just PS/2 keycodes under a different name.

Re: [Qemu-devel] [PATCH v7 5/6] s390x/cpu: Add error handling to cpu creation

2016-03-03 Thread Matthew Rosato


>> +S390CPU *s390_new_cpu(MachineState *machine, int64_t id, Error **errp)
>> +{
>> +S390CPU *cpu = NULL;
>> +Error *local_err = NULL;
> 
> Think the naming schema is "err" now.
> 
>> +
>> +if (id >= max_cpus) {
>> +error_setg(errp, "Unable to add CPU: %" PRIi64
>> +   ", max allowed: %d", id, max_cpus - 1);
>> +goto out;
> 
> Could we also move this check to the realize function?
> 
>> +}
>> +
>> +cpu = cpu_s390x_create(machine->cpu_model, _err);
>> +if (local_err != NULL) {
>> +goto out;
>> +}
>> +
>> +object_property_set_int(OBJECT(cpu), id, "id", _err);
> 
> We should add a check in between
> 
> if (err) {
> goto out;
> }
> 
>> +object_property_set_bool(OBJECT(cpu), true, "realized", _err);
>> +
>> +out:
>> +if (cpu != NULL) {
>> +object_unref(OBJECT(cpu));
> 
> Is the object_unref() here correct?
> I know that we have one reference from VCPU creation. Where does the second 
> one
> come from (is it from the hotplug handler? then I'd prefer a comment here :D )
> 

After some digging, I believe this unref is not necessary for s390
(bus-less) and I'm now questioning the i386 code that I used as a base...

@Igor/Andreas:

In i386, looks like the unrefs were due to the ref created when adding
the cpu to the icc bus.  Andreas moved the checks outside of pc_new_cpu
and explains their purpose here:
0e3bd562 - pc: Ensure non-zero CPU ref count after attaching to ICC bus

But then a subsequent patch removed the bus and left the unrefs:
46232aaa - cpu/apic: drop icc bus/bridge

Should that patch not have also dropped the unrefs in pc_hot_add_cpu()
and pc_cpus_init()?

Matt

1 2 3 >

1 - 100 of 298 matches

Mail list logo